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SELECTIVE FACILITATION AND INTERFERENCE 


IN RETENTION OF PROSE" 


GORDON H. BOWER? 
Stanford University 


A prose passage may be described at the level of asserted relations 
among general semantic categories or at the level of predications 
about specific details (e.g., names, dates, places). This experiment 
used interpolated learning to facilitate retention of the conceptual 
macrostructure of an originally learned passage while simultaneously 
interfering with retention of the originally learned passage's de- 
tailed microstructure. The subjects originally learned a short biog- 
raphy; the experimental subjects then learned two more biographies 
of similar conceptual format which had one third of the details 
changed. Later, cued recall of changed details of the originally 
learned passage showed retroactive interference and many intrusion 
errors. Free recall of the originally learned passage showed de- 
pressed recall of changed details and enhanced recall of unchanged 
details, but equal facilitation in recall of the conceptual macro- 


structure. 


| This experiment investigates interference 
processes in retention of meaningful prose. 
Although earlier investigations of retroactive 
interference with prose materials typically 
found weak effects, strong effects have been 
obtained recently in experiments that have 
taken more care in arranging optimal inter- 
fering conditions and in analyzing appropri- 
ate aspects of recall in accordance with 
interference principles established in the 
typical “nonprose” experiments (e.g., Ander- 
son & Myrow, 1971; Crouse, 1971; Myrow 
& Anderson, 1972). The present experiment 
extends this analytic approach to further 
interference phenomena in prose learning. 
Previous investigations suggest that one 
can devise interpolated texts which when 
learned can produce either retroactive 
interference, retroactive facilitation, or “no 


1This research was supported by grant 
MH13950-06 to the author from the National 
Institute of Mental Health. Robert Rothbart and 
John Anderson assisted in the collection and 
analysis of these data. The author wishes to 
thank James Crouse for making his experimental 
materials available to him and Richard Anderson 
for several unpublished (preprinted) manuscripts. 

2 Requests for reprints should be sent to Gordon 
H. Bower, Department of Psychology, Stanford 

Wa Stanford, California 94305. 


effect”? with respect to the retention of an 
originally learned passage. To illustrate, 
suppose that an originally learned passage 
is a short biography of a fictitious poet 
named John Payton and it contains the 
complex sentence, “Payton’s father worked 
as a blacksmith but he died of diphtheria 
when John was only five years old.” Con- 
sider a counterpart sentence that occurs in 
the interpolated passage, another biography 
about another fictitious poet, Robert Fowler: 
*Fowler's father worked as a blacksmith 
but he died of lung cancer when Robert was 
only three years old.” Comparing these 
sentences, some details have remained the 
same whereas others have changed. As- 
suming initial learning of the first sentence, 
the second sentence should help the subject 
retain the original fact that the poet’s father 
was a blacksmith, because that predicate 
construction is repeated for the main char- 
acter. However, the disease that killed the 
father and the son’s age when his father 
died are details that have been changed. 
For these, one expects response competition 
and retroactive interference when test 
questions explicitly probe for recall of their 
originally learned values. These two cases 
illustrate retroactive facilitation and retro- 
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active interference, respectively. The neu- 
tral or noninterference condition would have, 
in the interpolated passage, a sentence that 
mentioned neither the subjects nor the 
predicates of the original passage. A given 
interpolated passage will have certain 
proportions of facilitating, ineffective, and 
interfering constructions with respect to an 
originally learned text, and its net effect 
upon originally learned retention should 
vary systematically with the proportions of 
the three constructions among the test 
items for originally learned retention. Our 
investigation attempts to validate this 
commonsense analysis. 

A second issue we investigated is the 
recall of the conceptual macrostructure of the 
passage; this refers to the main conceptual 
categories and relations (or gist) of the 
passage without regard to the detail carried 
by names, dates, places, and numbers. Thus, 
the macrostructure of the two sentences 
above is that “the main character’s father 
had some occupation but he died of some 
disease when the character was only some 
years old.” The “somes” denote variables 
into which different instantiating details 
are inserted to produce a facilitating or 
interfering sentence for the interpolated 
passage. This formulation suggests that re- 
call of the categories of an originally learned 
passage may be enhanced by an interpolated 
passage that alters and interferes with 
memory of many of the originally learned 
details. The following experiment attempts 
to create this pattern, that is, retroactive 
inhibition in recall of the detailed micro- 
structure of the passage, while at the same 
time enhancing retention of its conceptual 
macrostructure. 

We do this by use of unprompted or “free” 
recall, examining the conceptual relations 
recalled as well as the accuracy of the de- 
tails instantiating those categories. Provided 
that the subject guesses when uncertain 
among competing answers, we should find 
retroactive facilitation in his recall of the 
conceptual macrostructure of the passage. 
In addition, facilitation or interference 
should be observed for recall of Specific 
details of the originally learned passage, 
depending on whether these are repeated or 
changed in the interpolated text. "These 


“detail-specific” effects should be as ap- 
parent in accuracy of cued-recall as in free- 

recall measures of originally learned reten- 

tion. The following experiment tested these 

interlocking expectations. 


METHOD 


Two groups of undergraduates (13 in each 
group, approximately half male and half female) 
served as subjects and were paid $1.75 for one 
hour’s service. Each subject learned three suc- 
cessive text passages and then received a final 
retention test over the initial two passages. All 
subjects originally learned the biography of a 
fictitious poet, John Payton. This passage, 
adapted from Crouse (1971), was arranged into 
15 distinct sentences totaling 173 words. The 
first 4 sentences are illustrative: 


John Payton was one of the finest poets England 
has ever known. Payton was born in Northshire 
at the end of October, 1810. His father was & 
servant who worked in the nearby town of 
Blackrock. When Payton was only five years 
old, his father was killed by a robber. 


These passages were shown to small groups of from 
2 to 4 subjects by means of an overhead projector. 
The individual sentences were shown one every 
5 seconds by sliding a cardboard 'mask" down a 
translucent sheet exposing the sentences one by 
one. The subjects were told to memorize each 
sentence but also to relate it to the overall text. 
After one study trial and a 10-second pause, the 
same originally learned passage was studied again. 
Following the second study trial, subjects read 
aloud in unison a slide of random digits for 20 
Seconds (to minimize short-term memory), and 
then they were asked to free recall the 15 sentences | 
of the passage studied. Recall instructions were ' 
somewhat vague: subjects were told that we were 
interested in how much they could recall verbatim 
(hence the 15 numbers down one side of the recall 
Sheet for the corresponding sentences); but they 
were further urged to reproduce the gist or sub- 
Stance of any facts they could not recall verbatim 
and also to write any fact they could recall re- 
gardless of whether they knew its order of pres- 3 
entation. The subjects had 6 minutes to write 
their free recall, and all finished by that time. 
We will refer to this as the immediate free-recall 
test of original learning. 

After collecting the free-recall sheets, the 
experimenter distributed the cued-recall sheet 
for the originally learned passage. This asked 20 
questions (having one-three word answers) ea 
garding specific details of the passage just studie! p 
For example, for the Payton story, 1 question 
asked for the occupation of Payton's father, 
another asked for Payton's year of birth, another 
asked how old Payton was at the time his ic 
died, ete. The subjects completed these cued- 


recall sheets in 2.5 minutes. We will refer to this 
as the cued-recall test for the originally learned 
passage. 

Upon arrival at the lab, subgroups of subjects 
had been assigned randomly to the experimental 
or control condition (subgroups of from two to 
| four subjects in these two conditions were run 
in random alternation). The two treatment groups 
received different interpolated passages, each 
learning and recalling two separate interpolated 
| passages (to enhance the hoped-for retroactive 
interference). The experimental subjects learned 
two more biographies which were very similar to 
the one for John Payton except that the main 
character had a different name (Robert Fowler, 
then Richard Hughes) and 22 specific details of 
the originally learned passage were changed. To 
illustrate the changes, the first four sentences of 
the “Robert Fowler" passage were identical to 
those for John Payton except for the details that 
Fowler was born in Hampstead in 1795 of a black- 
smith who was killed by a wolf when Fowler was 
eight years old. The third, the “Richard Hughes" 
passage, changed the same factual details as did 
the second passage. So in comparing these passages 
to the originally learned passage, one can pin- 
point which details have remained the same and 
which have changed. Control subjects learned 
two unrelated passages of similar length and 
difficulty, the first describing the collections of the 
fictitious King Library, the second describing the 
geography and inhabitants of the fictitious island 
of Karisoon. These passages were also adapted 
from Crouse (1971). 

Learning of each interpolated passage followed 
the same procedure as the originally learned 
passage: two presentations of the single sentences 
of the passage, followed by 20-second digit read- 
ing, followed by a 6-minute written free recall, and 
then a 2.5-minute cued-recall test with 20 ques- 
tions regarding details. For experimental sub- 
jects, the 20 questions on these interpolated tests 
were identical to those used for the originally 
learned passage except that the name of the poet 
was changed. 

Following the cued-recall test on the last 
interpolated passage, retention of the originally 
learned passage was tested, yielding the data of 
| primary interest. First, free recall was requested 
_ of “the first passage, the one about John Payton.” 
After 6 minutes, the subjects then received the 
cued-recall test for the originally learned passage 
(for 2.5 minutes), asking the same questions 
again. Because of time limitations, we were un- 
` able to test retention in both ways for the in- 
terpolated passages. However, a final cued recall 
was obtained for the first interpolated passage, 
List 2. 


Protocol Scoring 


Each subject contributed four free-recall proto- 
cols and five cued-recall protocols. Cued recall 
was scored as correct or wrong and, if wrong, 
whether or not an intrusion error from an in- 
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correct list occurred. A lenient criterion for gist 
recall was used. Free-recall scoring introduced 
several problems. We had to decide what were the 
parts of conceptualizations expressed in the text. 
These were basically of two types: (a) attributive 
or modificatory statements (e.g., “disease was 
cancer,” “John’s age was five years”) or (b) 
nouns serving as case-relation arguments in re- 
lation to the main verb—either as agent, recip- 
ient, time or location of the action (verb). For 
instance, the sentence, '*his father was killed by 
a robber,” has the two argument-relation pairs 
stipulating ‘‘robber-killed” (agent-act) and 
“killed-father” (act-recipient). Although one 
might quibble over our exact segmentation of the 
sentences into “‘idea units,” it is doubtful whether 
the overall pattern of the results would be much 
affected by these identifications (see Paul, 1959). 

By this means, we identified 63 “idea units” 
in the originally learned passage; of these, 22 had 
been changed in detail in both of the interpolated 
passages (learned by the experimental subjects), 
whereas 41 had remained the same throughout the 
originally learned passage and both interpolated 
lists. Recalls of these two kinds of originally 
learned units were tabulated separately. Each 
idea unit of the originally learned list was scored 
according to whether it was recalled as a specific 
detail (e.g., that Payton’s age was five years at 
the time his father was killed) and according to 
whether the right general kind of fact was re- 
called (e.g., mentioning Payton’s age at the time 
his father was killed). We called this latter the 
subject’s general fact recall and the former his 
specific fact recall. These are correlated measures, 
since correct recall of a specific fact entails recall 
of the corresponding general fact. However, the 
scoring scheme permits a general fact to be re- 
called without its specific detail being correct 
(due to intrusions or simple mistakes). Scoring 
was lenient with respect to paraphrases and 
synonyms but nonetheless was quite reliable for 
two judges (scoring was not “blind” for either 
judge). 


RESULTS 


Original Learning 


The degree of original learning for the 
three successive lists is shown by the im- 
mediate cued-recall proportions in Table 1. 
The experimentals and controls did not 
differ reliably in immediate recall on any 
list. This was true for free recall as well as 
for cued recall. There was a general practice 
effect from List 1. Although the experi- , 
mentals are slightly inferior to the controls 
on Lists 2 and 3, a result possibly reflecting 
negative transfer, that inference is un- 
warranted due to a covarying (small) dif- 
ference on List 1 performance and also to 
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TABLE 1 


Curp-REcALL PROPORTIONS ON THE IMMEDIATE Test FOR Lists 1, 2, AND 3 AND Cukp-RECALL 
Proportions AND INTRUSION PROPORTIONS ON THE DELAYED Test ron Lists 1 AND 2 


List 1 List 2 List 3 
Condition/test 
Experimental Control Experimental Control Experimental Control 

Cued recall | 

Immediate 54 .59 71 78 .08 15 

Delayed 40 .59 .59 75 
Intrusion 

Delayed .10 — 12 — 


the fact that the Lists 2 and 3 which were 
learned by the two groups differed, so there 
is no assurance that the lists were of com- 
parable difficulty. 


Forgetting: Cued Recall 


The cued-recall loss scores for Lists 1 and 
2 are caleulated by subtracting the delayed 
from the immediate cued-recall proportions 
in Table 1 (time did not permit a delayed 
test on List 3). For List 1, the experimental 
subjects had a specific loss of about 13.9%, 
whereas the controls forgot .4%. An overall 
analysis of variance was carried out on 
transformed (arcsin) recall proportions with 
experimental-control interpolation as a 
between-subjects factor and immediate- 
delayed testing as a within-subjects factor 
(see Winer, 1962, p. 302 ff.). As expected, 
that analysis revealed significant forgetting 
(F = 146, df = 1/24, p < 1) with 
reliably more forgetting by the experi- 
mental than the control subjects (inter- 
action, F = 11.9, df = 1/24, p < .01). 
Similar conclusions stem from comparing 
forgetting of List 2 items by the two groups 
(p < .01). 

A revealing statistic in Table 1 is the 
proportion of intrusions of competing re- 
sponses from interfering lists learned by the 
experimental subjects (e.g., recalling a birth- 
date from List 2 or 3 while trying to recall 
the one stated in List 1). These competitive 
intrusions were a major determinant of the 
differential loss in the originally learned 
passage for the experimental subjects. 


Free Recall 


Specific facts. Free recall of specific factual 
details was examined first. The proportions 


of specific originally learned facts recalled oi 
the immediate and delayed tests are sho 
in Table 2, divided according to change 
versus unchanged facts and according 
experimental versus control treatments. 
For the controls, the changed versus un 
changed set refers to those items in th 
originally learned passage that were changed 
or not changed in the interpolated passages 
learned by the experimental subjects. The 
total proportions for the changed and un- 
changed scores combined are shown for 
each cell, and the changes from the im- 
mediate-to the delayed-recall scores are 
easily computed. j 
These data allow several interesting 
comparisons whose discussion can begin 
with the total scores. Much as with earlier 
studies, these totals revealed no net retro: 
active interference. An overall analysis ol 
variance on total free-recall proportions 
(aresin total) yielded no effect due to treat 
ment groups or to the interaction of treat 
ment with retention interval (F = 1.44 
df = 1/24, p > .10). Surprisingly, there 
was even a slight increase in the total fact 
recalled between the immediate and the 
delayed retention test. 
However, the detailed story of what hapi 
pened is told by separately considering recal 
of the changed versus the unchanged 
specific facts. Relative to the controls, th 
experimental subjects forgot changed de 
tails while increasing their recall of um 
changed (repeated) details. An analysis 0 
variance was performed on the transformed 
proportions (see Table 2) with experiment 
versus control as a between-subjects factory 
and changed versus unchanged facts anc 
immediate versus delayed tests as tW 


ithin-subjects factors (see Winer, 1962, p. 
319 ff.). The effects of chief interest appear 
i s interactions. -First, the changed- 
hanged factor interacts with immediate 
` versus delayed testing (F = 104, df = = 1/24, 
p < .01), reflecting an increase in the un- 
hanged versus changed difference over the 
retention interval. Second, there was a sig- 
ificant triple interaction of the above two 
factors with the experimental versus control 
ontrast (F = 34, df = 1/24, p < .01). As 
expected, then, the results show retroactive 
interference i in recall of changed details and 
facilitation for unchanged (repeated) de- 
tails, with the changes being greater for the 
xperimental than for the control subjects. 
These specifie contrasts were confirmed by 
significant ¢ tests. One main effect was 
significant in the overall analysis—un- 
hanged facts were recalled better than 
hanged facts, even on the immediate test— 
but that is a materials construction artifact 
f no particular interest. No other effects 
Were significant in the overall analysis on 
ree recall of specific facts. Part of the loss 
in recall of changed details was due to 
intrusiotis of interpolated details while the 
subject was attempting recall of the orig- 
inally learned detail (average of 1.77 per 
subject). 
General facts. A general fact is said to be 
ecalled if the subject recalls the correct 
general category of a thing or relation 
etween things, though possibly with the 
il. Free-recall proportions for 


facts divided according to whether their 
details were changed or unchanged between 


TABLE 2 


REE-RECALL ProporTIONS FoR Spxciric Facts 
on IMMEDIATE AND DELAYED Tests 


Test/specific facts Experimental Control 
Immediate 
Changed 48 .93 
Unchanged .61 .68 
Total .56 -63 
Delayed 
Changed 34 51 
Unchanged 76 -73 
Total .62 .66 
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TABLE 3 


FREE-RECALL PROPORTIONS FOR GENERAL Facts 
on IMMEDIATE AND DELAYED Tests 


Test/general facts Experimental Control 
Immediate 
Changed .62 - 07 
Unchanged  - .66 72 
Total 64 70 
Delayed 
' Changed .76 71 
Unchanged .80 br 
Total 79 75 


the originally learned and the interpolated 
passages. 

These general-fact-recall proportions were 
used in an analysis of variance similar to 
that described above, with one between- 
subjects factor and two within-subjects 
factors. The chief effects of interest are the 
overall increase in recall scores from the 
immediate to the delayed test (for retention 
interval, F = 21.68, df = 1/24, p < .01) 
and the fact that this increase was larger 
for experimental than for control subjects 
(interaction, F = 5.40, df = 1/24, p < 
.05). The changed versus unchanged factor 
did not enter into any of these significant 
interactions, although it was associated 
with a significant main effect: unchanged 
items were recalled slightly better than 
changed items even initially (F = 6.36, 
df = 1/24, p < .05); this is the same ma- 
terials construction artifact noted before. 
The difference scores (delayed minus im- 
mediate free-recall proportions) make trans- 
parent the pattern of significant outcomes: 
experimental subjects improved their free 
recall of general facts more than control 
subjects, and this improvement was as large 
for general facts with changed details as 
for those having unchanged details. We may 
therefore conclude that improvement in re- 
call of the conceptual macrostructure is 
approximately the same whether the under- 
lying details are changed or kept the same 
from the original to the interpolated texts. 


Discussion 


The results accorded with commonsense 
expectations: retroactive interference was 
observed for changed details and retroac- 
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tive facilitation for unchanged details. 
Interpolated passages that were similar to 
the originally learned passage but which 
had altered details were nonetheless bene- 
ficial in facilitating retention of the con- 
ceptual macrostructure of the original 
passage. The interpolated passage boosted 
free recall of the appropriate kinds of facts, 
while lowering the accuracy of the details 
recalled. The effect here is similar to that 
encountered in free recall of categorized 
word lists: a taxonomic category that is 
repeated on an interpolated list but with 
different instances will be thereby enhanced 
in category recall of the original list but 
at the expense of a lower score on a con- 
ditionalized measure of "items per recalled 
category” (see Shuell, 1968). 

Some discussion is needed regarding lim- 
itations on the generality of our result. 
First, the learning and retention intervals 
here were short, spanning 35 minutes at 
most, and similar experiments are required 
at longer intervals and with larger amounts 
of learning material. Second, a serious lim- 
itation is that the experiment used only 
one specific biography as the critical orig- 
inally learned material. One should like to 
generalize the results to a larger population 
of materials—and not only to biographies 
but to other “fact-listing” passages and 
possibly even to less "fact-oriented" ma- 
terials. Therefore, replication of the experi- 
ment with several different sets of materials 
would clearly enhance the generality of its 
principal conclusions. 

An issue requiring discussion is whether 
our procedures oriented the subject far too 
much to word-for-word or verbatim mem- 
orization, stripping the text of its meaning- 
fulness and only thereby creating conditions 
favorable for demonstrating interference ef- 
fects in retention. We have serious doubts 
about the dichotomies implicit in such a 
remark. Nothing in interference theory re- 
stricts its applicability to meaningless ma- 
terial or to “factual details" such as names, 
dates, locations, and numbers. Simple sen- 
tences may be conceived of as being learned 
by associating concepts corresponding to 
the subject and -predicate of the sentence 
(see Anderson & Bower, 1973). For exam- 
ple, using such simple sentences as “the 


uncle shouted an obscene remark," Ander- 
son (1971) showed that following study of 
such sentences, later recall of the subject 
term (uncle) was cued almost as well by a 
paraphrase of the predicate (“——yelled 
some dirty words”) as by the verbatim pred- 
icate that had been studied. Moreover, 
this close correlation in cueing effectiveness 
of the verbatim predicate and its para- 
phrased predicate -was also found for sub- 
jects instructed to learn the original sen- 
tences by “rote, verbatim repetition.” The 
implication is that even with instructions 
designed to promote ‘“nonnormal,” rote 
processing of sentences, the subject none-j 
theless and willy-nilly established associa- 
tions between semantic concepts. In another 
relevant study, Anderson and Carter (1972) 
found that recall of a subject-predicate 
construction showed retroactive interference | 
from interpolated learning of a paraphrased 
predicate paired with a different subject, 
(the originally learned predicate was the 
recall cue for the originally learned reten- 
tion test). Such interference must have 
been at a semantic level. Such evidence in- 
dicates that interference principles apply to 
retention of meaningful text passages. 

A fourth issue requiring discussion con- 
cerns the clarity and utility of the distinc- 
tion between the conceptual macrostructure 
of a passage and its detailed microstruc- 
ture. This distinction currently rests on 
linguistic intuitions. Even formal systems 
for analyzing the semantical structure of 
text such as those of Crothers (1972) and 
Fredericksen (1972) only try to describe 
explicitly our intuitions regarding succes- 
sive “levels of abstraction” from a specific 
text. Starting from a detailed statement 
(e.g., “Mary washed her pet cat”), we may 
generalize it by substituting a more general 
category for each content word or by delet- 
ing a qualifier (e.g, “a girl cleaned her 
pet”); alternatively, we may generate an 
interfering specific statement by replacing 
the original instances with others. If these 
substitutes are within the same category 
(e.g., “Mary brushed her pet dog”), then 
the same categorical relationships have been 
repeated and these connections should be 
strengthened in memory. If the substituted 


instances come from different categories 
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(é.g., “Mary cleaned her bedroom”), then 
interference or uwnlearning would be ex- 
pected of the original conceptual (general) 
relation. Admittedly, assignment of particu- 
lar statement pairs to different levels of 
abstraction is not always clear, nor is the 
decision as to what is the same versus a 
different conceptual relation. Nonetheless, 
the decisions are sharp in many compari- 
sons (e.g., names, dates, places in biogra- 
phies), and it is an important analysis to 
aim for. Why? Because, following Bartlett 
(1932), most of us believe that text recall 
is initially a matter of the person remem- 
bering the general gist or an “abstract out- 
line” of a text, and then fleshing out the 
details of this abstract (partly by “recon- 
struction”). Our biography-recall data il- 
lustrate in a simple, transparent context 
how repetition of an abstract outline facili- 
tates recall of at least the right general 
kinds of facts. Presumably, the same facili- 
tation could be shown for materials other 
than listings of specific names, dates, num- 
bers, and places. Presumably, also, one 
should be able to interfere with recall of the 
conceptual macrostructure by breaking the 
originally learned conceptual relations with 
the interpolated passage. These are matters 
for future research. 

Finally, let us consider a practical matter, 
namely, interference potentialities in learn- 
ing school materials. A critic might point 
out that in the present experiment, retro- 
active interference was produced only by 
highly similar passages, and even then the 
magnitude of the forgetting was not dra- 
matic. Is such retroactive interference, 
then, a phenomenon likely to be restricted 
to only a small portion of what a student 
learns? The critic may ask, “After all, how 
often does a curriculum teach a student 
totally different answers to the same ques- 
tion?” 

The answer, of course, is “practically 
never,” but the critic’s charge plays upon a 
deceptive indefiniteness in our notions of 
“the same question” and “different an- 
swers.” Suppose that memory for a simple 
proposition depends upon associating the 
semantic concept underlying the subject 
(A) to that underlying the predicate (B) so 
that the associative structure is A-B. By 


such analyses, then, interference effects 
should arise whenever knowledge requires a 
"multiple listing" of predicates that apply 
to the same concept or similar concepts 
(see Anderson & Bower, 1973). One example 
would be a zoological classification in which 
a species is characterized by a list of salient 
attribute-value pairs (e.g., heart type, ver- 
tebrate-invertebrate, habitat, color, size, 
etc.) and different species are defined by a 
conjunction of different values on the same 
attributes. A second example would be 
temporal conjunctions defining historical or 
contemporaneous events. A schematic ex- 
ample would be, “When A was King and B 
was his General, War C was fought. But 
when A’ was King and D was the General, 
War E was fought." Confusion would surely 
reign when the student sorts out in mem- 
ory which king went with which general 
and which pair got involved in which war. 
A third example where interference may be 
expected is when the same objects (e.g., 
different historical characters) are to be 
rank ordered along two independent dimen- 
sions, such as historical date and relative 
power or senatorial seniority and political 
influence. To the extent that objects oc- 
cupy differing locations in the two linear 
orders, confusions and interference may be 
expected in learning and keeping separate 
the two orders. 

"These examples—and many more of this 
general type come readily to mind—suggest 
many opportunities for interference process- 
es to exert a pervasive influence on learn- 
ing and retention in the usual curriculum. 
Moreover, these interference effects would 
seem to apply just as formidably to the re- 
retention of “meaningful principles" or 
rules as it does to "straight facts." First, 
the distinction between principles and facts 
is none too clear anyway. Second, under 
proper analysis, rules would seem to be 
decomposable into atomic propositions con- 
nected by conditionals which, if the indi- 
cated analysis has merit, are just as vulner- 
able to retroactive interference when it is 
“aimed” at the proper level. 
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COEDUCATION, VALUES, AND SATISFACTION WITH SCHOOL 
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Boys and girls from the two senior classes in eight, Adelaide, Australia, 
coeducational and single-sex state secondary schools ranked sets of 
values from the Rokeach Value Survey, first in order of importance 
for themselves (own values) then in the order they thought their 
schools would emphasize them (school values). They then completed 
a modified form of the Cornell Job Description Index and a rating of 
happiness with school. Factor analysis indicated basic similarities 
across schools in the ordering of both own and school values, but no 
factor emerged contrasting average value systems for coeducational 
versus single-sex schools. Boys in coeducational schools were more 
satisfied with classmates and teachers than were boys in single-sex 


schools, Results were related to recent theory and research. 


It is commonly argued that coeducation at 
the secondary school level is necessary to 
prepare children to take their places natu- 
rally in the world of men and women. It is 
contended that the social environment of the 
coeducational school would be less artificial 
than that of the single-sex school, and the 
adaptations learned in an environment that 
more accurately mirrored that of the wider 
social context would better equip children to 
adjust to the adult world beyond the school. 
The argument is a compelling one, but sur- 
prisingly, very few studies have been con- 
ducted to compare the effects of coeduca- 
tional and single-sex schools on the children 
attending them. 

What sorts of studies have been carried 
out? In Britain, Dale (1969, 1971) has con- 
ducted an extensive program of research into 
coeducation in which questionnaires were 
administered to pupils, ex-pupils, and 
teachers. He found that coeducational 
schools were generally preferred to single- 
sex schools by both teachers and students. 
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Flinders University and by the Australian Research 
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The author is indebted to Andrew Ellerman for 
his assistance in data analysis. 

? Requests for reprints should be sent to N. T: 
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sity of South Australia, Bedford Park, South Aus- 
tralia 5042. 


The school atmosphere was thought to be 
more congenial in coeducational schools, and 
students saw their teachers as friendlier and 
more helpful. Single-sex schools were per- 
ceived to involve stricter discipline, and 
teachers in these schools were seen as more 
distant. Dale (1969) reported that there 
was a tendency “for the differences between 
the attitudes of boys in boys’ schools and 
those of boys in mixed schools, towards their 
school life, to be less sharp than are the 
comparable differences between the two 
groups of girls [p. 232]." He also noted that 
there was no evidence that the education of 
the sexes together resulted in a lowering of 
academic standards. Indeed, as far as boys 
were concerned, the weight of the evidence 
seemed to be on the opposite side (Dale, 
1962a, 1962b, 1964). 

A study recently reported by Jones, Shall- 
crass, and Dennis (1972) and conducted in 
New Zealand was less positive toward co- 
education. Students in a boys’ school, a 
girls’ school, and a coeducational school 
completed items from a questionnaire used 
by Coleman (1961). The authors were in- 
terested in testing Coleman’s suggestion that 
status in the adolescent society of the co- 
educational secondary school may depend 
more upon popularity than upon scholastic 
or intellectual achievement, with a conse- 
quent emphasis upon “rating and dating.” 
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Hence coeducation may have a stultifying 
effect on intellectual activities and “may 
be inimical to both academic achievement 
and social adjustment [Coleman, 1961, p. 
51].” Jones et al. did, in fact, find some 
support for Coleman’s suggestion in their 


analysis of the responses of boys and girls to © 


the questionnaire. Furthermore, differences 
on most items were larger for the two groups 
of girls than for the two groups of boys. 
Thus, as in the studies conducted by Dale, 
the influence of the different types of school 
was more evident among girls than among 
boys.’ 

The present study was designed to pro- 
vide information about the effects of coedu- 
cation in the Australian context, using a 
limited sample of schools drawn from the 
Adelaide metropolitan area. It focused 
upon differences between coeducational and 
single-sex high schools both in regard to the 
relative importance assigned by students to 
different values and in regard to their ex- 
pressed satisfaction with various aspects of 
the school situation. The concept of value 
is a central one in social science (Rokeach, 
1968a, 1968b), yet curiously there has been 
less empirical examination of general value 
differences than there has been of differences 
in more specific attitudes, possibly because 
of the lack of suitable instruments for as- 
sessing values. There has been no lack of 
theoretical discussion of the value concept, 
however, and of its relevance to wide areas 
of social science, including education (e.g., 
see Rokeach, 1968a, 1971, 1973; Williams, 
1971). The present study was one aspect 
of an extensive program of research on 
values stimulated in part by Rokeach’s 
analysis (for details see Feather, 1972c). 

One might expect coeducation to affect 
both values and attitudes among students. 
If Coleman (1961) and Jones et al. (1972) 
are correct, then students from coeduca- 


*Tt is obviously very difficult to compare the 
results of the two studies just described. One 
would have to make allowance for factors asso- 
ciated with the different cultures, for differences in 
the assessment procedures that were employed, for 
differences in subject characteristics, and indeed for 
a host of varying factors that would affect com- 
parisons. 


tional schools might regard values relating 
to social approval and affiliation as more im- 
portant for self than would students from 
single-sex schools, and this difference might 
be especially evident among the girls. Fol- 
lowing Dale, one might also expect students 
from single-sex schools to see their schools 
as placing more emphasis upon values con- 
cerned with discipline and control than 
would students from coeducational schools. 
As far as attitudes are concerned, if teachers 
are friendlier and more helpful in coeduca- 
tional schools, then one would expect this 
difference to be evident in students’ reported 
satisfaction with their teachers. Similarly, if 
the social environment of the coeducational 
school is seen as richer and more complete, 
involving, as it does, both sexes, then one 
would expect coed students to express more 
satisfaction with their classmates. It was 
possible to explore these types of questions 
concerning general values and more specific 
attitudes by using the Rokeach Value Sur- 
vey (Rokeach, 19682, 1968b, 1971, 1973) 
and a modified form of the Cornell Job De- 
scription Index (Smith, Kendall, & Hulin, 
1969). These measures were selected because 
they have had a lot of use in recent years, 
are well researched, and were appropriate 
to the subjects involved in the study and to 
the overall aim of the general survey of 
which the present study was part. 


METHOD 


Subjects and Schools 


The present study formed part of an extensive 
survey involving nearly 3,000 school children at- 
tending their last two years of secondary school 
education (leaving and matriculating classes) in 19 
schools in the metropolitan district of Adelaide, 
Australia (Feather, 1972a, 1972b, 1972c). The pres- 
ent investigation involved 8 of these schools, all of 
which were administered and funded by the state 
government. Of these, five were high schools offer- 
ing a fairly general range of courses. The remain- 
ing three high schools concentrated more upon 
technical and commercial courses. The number 
of subjects from the various schools were as 
follows: coeducational schools—General High 
School A, 99 boys, 79 girls; General High School 
B, 163 boys, 127 girls; General High School C, 95 
boys, 90 girls; Technical High School A, 71 boys, 
56 girls; single-sex schools—General High School 
D, 143 boys; Technical High School B, 153 boys; 
General High School E, 126 girls; and Technical 
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High School C, 105 girls. Most children were in 
the age range 15-17 years. The youngest boys and 
the youngest girls were those in the single-sex 
technical high schools, where the proportion of 
final year matriculating students was very low. 
Status of father's occupation, as measured by the 
Congalton Index (Congalton, 1969) was slightly 
higher for both boys and girls in the coeducational 
high schools than for boys and girls in the single- 
sex schools and slightly higher for girls only in the 
general high schools when compared to girls in the 
technical high schools. 


Procedure 


Details of test administration, including infor- 
mation about the tests used and specific instruc- 
tions, have appeared elsewhere (Feather, 19722). On 
Form E of the Rokeach Value Survey subjects are 
normally required to rank sets of values (18 termi- 
nal values and 18 instrumental values) in their 
order of importance, the values being presented 
alphabetically and with short descriptive phrases. 
In the present survey there was not sufficient time 
to permit subjects to rank both sets of values; the 
questionnaires were randomly distributed so that 
approximately half the subjects ranked the termi- 
nal values only, while the remaining subjects 
ranked the instrumental values only. Each subject 
ranked whichever set of values he or she had in two 
ways, first in relation to self, then in relation to 
the school he or she attended. The first set of in- 
structions asked subjects to study the list of 18 
values carefully, then to place a 1 next to the 
value “which is most important to you, place a 2 
next to the value which is second most important 
to you, ete. The value which is least, important, 
relative to the others, should be ranked 18." The 
second set of instructions asked subjects to “as- 
sume that your School is attempting to turn out 
children with certain kinds of values, some of which 
the School considers to be more important than 
others.” They were then asked to “think of a 
student who has these values that the School 
strives to emphasize and think of the order in 
which the School would emphasize them." They 
then ranked the values 1-18 according to how 
they thought the school would emphasize them 
from most important to least important. The two 
sets of rankings provided for self and then for 
School represented own value systems and school 
value systems, respectively. 

A modified form of the Cornell Job Description 
Index (Smith, 1967; Smith et al., 1969) developed 
by the author to apply to the school situation was 
used to obtain measures of school satisfaction. This 
modified Job Description Index required subjects 
to check lists of 18 items as they applied to school- 
work, people in my class, and the typical teacher. 
Separate satisfaction scores that could range 0-18 
in the direction of increasing satisfaction were ob- 
tained for each of these three aspects of the school 
situation (see Feather, 1972a). 

Finally, subjects were asked to rate how much 
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they enjoyed being at school by putting a cross on 
a five-inch scale labeled “Very happy at school” at 
one extreme, “Don’t like school at all” at the 
other extreme, and “Moderately happy at school” 
in the middle of the scale. Responses were scored 
1-9 in the direction of increasing happiness, with 
a score of 5 spanning the midpoint of the scale. 


RESULTS AND DISCUSSION 


Analysis of Average Value Systems 


The median rankings for each set of 18 
values (terminal or instrumental) were cal- 
culated for each of the eight schools for both 
self (own) values and for school values. For 
the set of 18 terminal values, there were 
16 different sets of medians (or average 
value systems) corresponding to the com- 
bination of eight schools and two types of 
ranking (own values and school values). 
These 16 average orders were intercorrelated 
using the Spearman rank-order procedure to 
yield a 16 x 16 matrix for the terminal 
values. This correlation matrix, with unities 
in the diagonal cells, was factor analyzed 
using the principal-component method in 
conjunction with varimax rotation for eigen- 
values greater than one. Two factors 
emerged; the first, accounting for 47.3276 
of the variance, could be clearly identified 
as & school value factor, indicating a basic 
similarity in the median rankings of school 
terminal values across all conditions. The 
second factor, accounting for 42.57% of the 
variance, could be clearly identified as an 
own value factor, indicating a basic simi- 
larity in the median rankings of own ter- 
minal values across all conditions. The na- 
ture of these average orders is indicated in 
Table 1 for boys and girls in coeducational 
and single-sex schools. 

The same procedure was applied to the 
16 average orders for the set of 18 instru- 
mental values. A school value factor also 
emerged as the first factor in this analysis, 
accounting for 48.33% of the variance. Two 
other factors were also extracted, both 
clearly identifiable. The first, accounting for 
22.86% of the variance, was a female own 
value factor, indicating a basic similarity 
in the median rankings of own instrumental 
values across females. The second, account- 
ing for 20.21% of the variance, was a male 
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TABLE 1 


MEDIAN RANKINGS OF TERMINAL VALUES IN RELATION TO SELF AND SCHOOL ron Boys AND GIRLS 
IN CoEDUCATIONAL AND SINGLE-SEx SCHOOLS 


Boys Girls 
Terminal value Coeducational Single sex Coeducational Single sex 

Own School Own School Own School Own School 
A comfortable life 10.79 8.82 9.13 7.18 13.96 0.29 13.56 11.43 
An exciting life 7.56 11.58 8.83 12.63 9.88 11.80 9.57 11.75 
A sense of accomplish-| 7.50 2.53 8.06 2.44 7.73 1.74 8.57 1.85 

ment 

A world at peace 5.33 8.38 5.44 8.25 4.77 8.11 3.60 7.7 
A world of beauty 12.82 11.81 12.05 10.82 11.53 10.93 12.00 10.38 
Equality 7.68 5.68 7.14 5.46 5.59 5.36 6.33 5.40 
Family security 8.38 10.50 6.50 9.83 7.21 8.86 7.50 9.29 
Freedom 4.57 7.50 6.19 7.50 5.55 7.27 5.57 8.71 
Happiness 7.08 8.88 6.94 9.60 6.46 10.11 6.43 10,14 
Inner harmony 9.95 11.57 11.75 10.69 8.25 10.86 8.20 10.80 
Mature love 6.43 15.24 6.32 14.75 7.89 16.06 8.27 16,41 
National security 13.31 10.83 14.42 9.00 13.59 9.60 12.59 10.33 
Pleasure 11.03 11.79 10.06 12.00 13.03 13.57 12.71 13.00 
Salvation 16.09 14.29 16.19 13.82 15.04 15.00 15.00 12.50 
Self-respect 10.60 5.57 10.46 6.54 9.75 6.00 10.64 5.77 
Social recognition 14.40 6.59 13.54 6.14 14.09 6.31 14.15 4.90 
True friendship 5.38 7.74 4.97 9.00 5.18 8.73 4.70 8.42 
Wisdom 7.50 2.54 7.50 3.44 6.05 2.00 6.45 2.53 


Note. In each column low numbers denote high relative value. 


own value factor, indicating a basie similar- 
ity in the median rankings of own instru- 
mental values across males. The nature of 
these average orders is indicated in Table 2 
for boys and girls in coeducational and 
single-sex schools. 

Thus, as in other studies (Feather, 1972b, 
1973), factors involving own and school 
values emerged. There were basic disparities 
in the way students ranked their own values 
and in the way they ranked the values for 
their school, but neither factor analysis pro- 
vided evidence for a factor contrasting co- 
educational high schools with single-sex 
high schools. 


Analysis of Particular Values 


The rankings for each value were sepa- 
rately analyzed using 2 X 2 analyses of 
variance involving Sex Composition of 
. School (coed vs. single sex) X Type of 
School (general vs. technical) as factors 
and using the method of unweighted means 
to allow for unequal ns (Winer, 1962). As 
in previous studies (Feather, 1972a), the 


rankings were first transformed using the 
normal curve (Hays, 1967, pp. 35-39). The 
analyses were run for males and females 
separately, first for the transformed rank- 
ings of own values then for the transformed 
rankings of school values. A conservative 
alpha level (p < .01) was adopted to test 
significance in view of the ipsative nature 
of the ranking procedure and the large num- 
ber of comparisons that were made. 

There were no significant differences be- 
tween the coeducational and single-sex 
schools in the relative importance assigned 
to partieular terminal values by either boys 
or girls for either self or school rankings. 
In regard to particular instrumental values, 
however, both boys and girls from single- 
sex schools ranked being clean as higher in 
relative importance than did students from 
coeducational schools; this was true for both 
self and school rankings. Girls in single-sex 
schools also saw their schools as placing 
relatively more emphasis upon being helpful 
and polite than did girls in coeducational 
schools; they also saw their schools as as- 
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TABLE 2 


MEDIAN RANKINGS OF INSTRUMENTAL VALUES IN RELATION TO SELF AND SCHOOL Fon Boys AND GIRLS 
IN CoEDUCATIONAL AND SrNGLE-SEX SCHOOLS 


Boys Girls 

Instrumental value Coeducational Single sex Coeducational Single sex 

Own School Own School Own School Own School 
Ambitious 7.40 2.81 6.20 2.85 9.41 1.76 8.75 2.00 
Broadminded 6.77 11.11 7.25 10.04 5.96 12.06 7.30 12.1 
Capable 8.21 5.98 8.36 6.63 9.41 6.75 9.81 6.25 
Cheerful 8.75 13.50 9.40 14.61 7.54 13.74 6.94 13.05 
Clean 10.53 8.08 8.38 6.65 9.81 8.54 7.75 6.58 
Courageous 9.90 13.75 9.56 14.03 9.83 13.61 11.17 14.05 
Forgiving 10.08 14.14 11.15 14.46 8.65 14.37 8.36 14.15 
Helpful 10.38 10.54 12.00 10.56 7.88 10.89 8.65 7.10. 
Honest 4.70 5.82 3.90 5.93 2.58 6.11 3.25 5.75 
Imaginative 13.06 11.68 13.33 10.06 14.09 12.35 15.05 13.16 
Independent 9.00 11.26 7.43 10.69 11.58 10.50 10.59 11.44 
Intellectual 11.46 5.21 10.71 6.90 14.58 5.50 12.95 7.83 
Logical 8.91 8.31 10.08 7.83 12.21 8.27 12.79 9.70 
Loving 10.09 17.06 10.00 16.45 7.83 17.23 7.19 17.07 
Obedient 13.00 4.56 12.11 5.61 11.54 4.16 10.69 5.33 
Polite 9.40 5.56 9.83 6.81 8.16 5.09 7.28 3.70 
Responsible 5.00 4.48 4.63 5.03 3.80 4.33 4.93 4.36 
Self-controlled 7.88 8.09 7.55 8.68 8.58 7.31 9:79 7.07 


Note. In each column low numbers denote high relative value. 


signing less importance to being intellectual 
and logical. 

There was only one significant effect in- 
volving the Sex Composition of School x 
Type of School interaction (at the conserva- 
tive alpha level, p < .01): Boys in the co- 
educational technical high school ranked 
being forgiving as relatively more important 
for themselves than did boys in the single- 
sex technical high school, but there was no 
such difference among boys in the general 
high schools.* 

In summary, therefore, main effects of 


“Among the terminal values, boys ranked the 
following values as relatively more important for 
self than did girls: a comfortable life, mature love, 
and pleasure. But girls ranked inner harmony as 
relatively more important for self than did boys. 
Among the instrumental values, boys ranked the 
following values as relatively more important for 
self than did girls: being ambitious, imaginative, 
independent, intellectual, and logical. But girls 
ranked the following values as relatively more im- 
portant for self than did boys: being helpful, 
honest, loving, and polite. All of these differences 
between boys and girls were highly significant (p< 
01 or p < 001). 


sex composition of school (coed vs. single 
sex) were found only in regard to the in- 
strumental values and then, with the ex- 
ception of being clean, only for the girls and 
only for the perceived school values. Some 
of the differences supported Dale’s sugges- 
tion that single-sex schools might assign 
more importance to rules of conduct, but 
there were other values related to control 
and conduct (e.g being obedient) that 
showed no significant differences at all. Nor 
was there any support for Coleman’s sug- 
gestion that coed students might set greater 
store upon being popular and accepted by 
their peers than would students in the single- 
sex secondary schools. 

There were some main effects of type of 
school (general vs. technical) on value rank- 
ings. In particular, for both boys and girls, 
inner harmony was ranked as relatively less 
important and a comfortable life as rela- 
tively more important in the technical high 
schools than in the general high schools 
when rankings were for self. These results 
may reflect the greater vocational emphasis 
in the technical high schools. 
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TABLE 3 


Mean SATISFACTION AND HAPPINESS Scores FOR 
Boys AND GIRLS IN RELATION TO SCHOOL 


ATTENDED 
M satisfaction scores A 
Type and composition | ——— — — ———— —————-| happi 
Steal School- eus 4 Typical | ang. 
work | my class} teacher 
Boys 
Coeducational 
General 7.81 | 10.22 | 10.90 | 5.08 
Technical 8.56 | 10.57 | 11.55 | 5.64 
Single sex 
General 6.32 | 8.92 | 10.53 | 4.84 
Technical 8.86 | 9.55 | 10.03 | 5.23 
Girls 
Coeducational 
General 7.40 | 11.36 | 10.19 | 5.22 
"Technical 8.48 | 10.04 | 11.43 | 5.73 
Single sex 
General ‘8.24 | 10.58 | 10.06 | 5.34 
Technical 8.45 | 11.43 | 11.99 | 5.29 


Analysis of Satisfaction with School 


Table 3 presents the mean satisfaction 
scores and mean happiness ratings for boys 
and girls in relation to school attended. The 
2 X 2 analyses of variance applied to the 
data for the boys indicated that boys in the 
coeducational high schools reported more 
satisfaction both with people in my class 
and with the typical teacher than did boys 
in the single-sex high schools (p <’.001 and 
p < .05, respectively). There was a signifi- 
cant Sex Composition of School x Type of 
School interaction in regard to satisfaction 
with schoolwork (p < .01): Boys in the co- 
educational general high schools reported 
more satisfaction with their schoolwork 
than did boys in the single-sex general high 
school, but this difference was in the re- 
verse direction for boys in the technical 
high schools, although only marginally so. 

The only significant effect involving co- 
education in the 2 X 2 analyses of the data 
for the girls was a Sex Composition of 
School x Type of School interaction (p < 
.01) : Girls in the coeducational general high 
schools expressed greater satisfaction with 
people in my class than did girls in the 
single-sex general high school, but this dif- 


ference was reversed for girls in the techni. 
cal high schools. There, the girls in thi 
single-sex school expressed greater satisfac: 
tion for people in my class. 

The girls in general tended to repo 
greater satisfaction with people in my clas 
than did the boys (p « .001), regardless o 
whether their school was coeducational o 
single-sex or general or technical. 

'There were some significant main effec! 
of type of school (general vs. technical) ii 
relation to the measures of satisfaction 
Both boys and girls in the technical hig) 
schools expressed greater satisfaction witl 
schoolwork than did boys and girls in th 
general high schools (p « .001 for boys; 
p « .05 for girls). Girls in the technical 
high schools reported greater satisfaction 
with the typical teacher than did girls i 
the general high schools (p « .001). Boys 
in the technieal high schools expressed 
greater happiness with school than did boys 
in the general high schools (p < .05). In 
the technical high schools, children may see 
their schoolwork as more relevant to future 
careers than do children in the general high 
schools, where the vocational emphasis 
tends to be much less apparent and where 
the curriculum is rather more academic and 
general. 


CONCLUSIONS 


The results of the present study indicated 
that there were few differences in the way 
boys and girls from coeducational and 
single-sex schools assigned importance to 
values. The evidence was somewhat stronger 
for differences in attitudes toward aspects 
of the school situation when children from 
coeducational schools were compared to 
children from single-sex schools. There was 
no support for Coleman’s hypothesis con- 
cerning possible adverse effects of coeduca- 
tion and limited support for Dale's sugges- - 
tion that single-sex schools may be seen as 
more concerned with discipline and control. 
As far as values were concerned, the major 
distinction was between the value systems 
the students assigned to themselves and the 
order of values they believed their schools 
were trying to promote. 

In future investigations, one might em- 


COEDUCATION, VALUES, AND SATISFACTION WITH SCHOOL 15 


ploy finer grain measures in the hope that 
they would be more responsive to the effects 
of coeducation and, if possible, take a much 
wider sample of different types of school, 
both single-sex and coeducational. In the 
foreseeable future, however, finding a single- 
sex school within the state school system 
might (as one headmaster put it) be like 
looking for a saber-toothed tiger. Moreover, 
the move toward coeducation in Australia 
is a general one now beginning to extend to 
some of the independent schools as well. 
Hence the present study is in the unique 
position of presenting information about & 
division that is rapidly disappearing. 
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REMEDIATION OF LEARNING PROBLEMS AMONG 
THE DISADVANTAGED' 


SEYMOUR FESHBACH? ann HOWARD ADELMAN 
University of California, Los Angeles 


In an effort to help clarify whether variations in school input can 
among the disadvan 


remedy academic deficiencies 
experimental project was carried 


the impact of an intensive, individualized, and integrated remedial 
program on a group of disadvantaged youngsters with learning prob- 
at (a) The disadvantaged students 


lems. The major findings are th 


learned as effectively as a comparable group of 
and significantly better than disadvantaged youngsters enrolled in a 
more traditional compensatory education program, and (b) these find- 
ings held for young adolescents as well as elementary-age children. 
The study is seen as providing evidence of the favorable consequences 
for the disadvantaged of comprehensive, 


educational programs. 


—— 


There are few issues in American educa- 
tion that are the object of as much atten- 
tion and controversy as the educational 
problems presented by so-called “disadvan- 
taged” 3 populations (Ornstein, Doll, Arnez, 
& Hawkins, 1971). Indeed, the significance 
of these problems extends beyond education, 


* The project was supported by funds from the 
California State Department of Education, Divi- 
sion of Compensatory Education through the Bu- 
reau of Professional Development's Research and 
Teacher Education (RATE) program. 

The authors wish to express a debt of gratitude 
to the many individuals at the Fernald School and 
in the Los Angeles Public Schools who participated 
in the various aspects of this project. We wish to 
particularly acknowledge the help of Frances 
Berres, Associate Head of the Fernald School, who 
assumed a major responsibility in administering 
the Fernald School phase of the program. Special 
mention should also be made of the devotion and 
assistance given to various phases of the project 
by John Long and Williamson W. Fuller ITI. 

2 Requests for reprints should be sent to Sey- 
mour Feshbach, Department of Psychology, Uni- 
versity of California, Los Angeles, California 90024. 

3 As defined in Title I of the Elementary and 
Secondary Education Act of 1965, the term “dis- 
advantaged” designates pupils from families with 

income below $3,000 per year. It is clear, however, 
that the term also is used to designate segments 
of racial or ethnic minority groups and often is 
intended to connote that such groups are culturally 
different. 


taged, a three-year 
out. Specifically, the focus was on 


middle-class youngsters 


as contrasted with piecemeal, 


having profound social, economic, and po- 
litical implications as well. 

There is little disagreement as to the 
descriptive aspects of the problem, namely, 
that children from economically disadvan- 
taged populations perform more poorly on 
measures of academic achievement than 
children from advantaged populations. 
Sharply differing explanations, however, 
have been offered to account for such find- 
ings. In the case of blacks, a significant 
proportion of whom can be categorized as 
disadvantaged, one view suggests that raci- 
ally linked genetic factors are responsible 
for race differences in IQ and academic 
achievement (Jensen, 1969). A more widely 
held explanatory view focuses on pupil at- 
titudinal and cognitive characteristics that 
arise from variations in home environment 
(Deutsch, 1968; Hess, Shipman, Brophy, & 
Bear, 1968; Hunt, 1968). Both genetic and 
early environmental social pathology ex- 
planations have been pointedly criticized 
by Baratz and Baratz (1970), who suggest 
that the principal aetiological factor is the 
inappropriateness of most school programs 
for the black disadvantaged child. A num- 
ber of other writers have also stressed the 
inadequacy of the typical educational 
structure and program to which the disad- 


16 
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vantaged child is subject, although not nec- 
essarily agreeing on which school character- 
isties are ineffective or counterproductive 
(Clark, 1965; Fantini & Weinstein, 1968). 
A related, alternative position focuses on 
the interaction between pupil and school 
characteristics (Deutsch, 1963; Feshbach & 
Adelman, 1971). 

These diverse views of the antecedents of 
academic deficiencies among the disadvan- 
taged have led to numerous variations in 
school inputs, but because of the apparent 
lack of success of remedial and enrichment 
programs at most age levels, there is a 
general pessimism regarding such programs. 
Nevertheless, it remains an open question 
as to whether variations in school input can 
result in the remediation of academic defi- 
ciencies among the disadvantaged. The pri- 
mary objective of the present investigation 
was to contribute data that could help re- 
solve this issue. Specifically, the foeus was 
on the impact of an intensive, individua- 
lized, and integrated remedial program upon 
academic achievement, intellectual and per- 
ceptual-cognitive functioning, aspiration 
levels, and self-attitudes of a group of dis- 
advantaged youngsters who manifested 
school problems. For purposes of compari- 
son, the impact of this intensive program 
was contrasted with a more traditional com- 
pensatory education program. In addition, 
the study provided the opportunity to ex- 
plore similarities and differences between 
groups of disadvantaged and middle-class 
children, all of whom manifested learning 
problems, with reference to a number of the 
variables under investigation. 


MrrHOD 


Subjects 


Both the disadvantaged and advantaged stu- 
dents who participated in the project met the fol- 
lowing criteria. They all were (a) male, (b) of at 
least average intelligence (in a few instances, a 
youngster with an IQ in the high 80s was included 
if the data in his records suggested that the IQ 
indicated might underestimate his true ability), 
(c) one and one-half or more years retarded in basic 
School skills, and (d) without severe neurological or 
Severe emotional disturbances. NC 

In addition, the disadvantaged students lived in 
areas that were designated as poverty pockets, that 
is, average family income was approximately $3,000 


a year, (It is recognized that the economic criterion 
does not adequately define the concept of disad- 
vantaged as this term has come to be used in 
current literature. Nevertheless, income is undoubt- 
edly the best single criterion and predictor of a 
disadvantaged condition.) These children were 
chosen from a list that the counselors at each par- 
ticipating school prepared to conform with the 
above criteria. From these lists, the project staff 
selected different children for participation in the 
project during the academic years 1966-67, 1967-68, 
and 1968-69, During the first academic year, 30 
elementary and 30 junior high disadvantaged 
youngsters participated in the study; during the 
subsequent two years, 50 elementary and 30 junior 
high disadvantaged youngsters participated. In 
each of these years, approximately 90% of the sub- 
jects were black. 

The advantaged students were all selected from 
the tuition-paying clients enrolled at the Fernald 
School. This school, a facility of the psychology 
department of the University of California, Los 
Angeles, is a research and training laboratory 
focusing on learning disorders not due to mental 
retardation or severe neurological or emotional 
pathology. With few exceptions, the advantaged 
students were middle- or upper-class Anglos. 


Design 

During the first year, the 60 disadvantaged 
youngsters selected were grouped into triplets, 
matched for age, IQ, race, and severity of learning 
deficit. From each triplet, one student was ran- 
domly assigned to the group that was bussed to the 
Fernald School; another student was assigned to 
the school enrichment program, which was con- 


: ducted in the home schools; and the third was as- 


signed to a control group. In this way, each group 
was assigned 10 elementary and 10 junior high stu- 
dents. Then, a group of 10 elementary and 10 junior 
high advantaged youngsters was identified from the 
regular Fernald School population to form a fourth, 
comparison group, matched for age, learning dis- 
ability, and approximate IQ with the disadvan- 
taged samples. 

In the second year of the program, although 
there were several changes, this same general de- 
sign and procedure was followed. The changes in- 
volved (a) doubling the number of elementary 
students in the school enrichment and control 
groups (from 10 to 20), (b) including three rather 
than two publie schools, and (c) selecting public 
schools in a different section of Los Angeles, that 
is, in the first year, the students had come from 
the Venice area on the western boundary of the 
city, and in the second year they came from mid- 


city. 
Remedial Programs 

Two remedial programs were implemented and 
evaluated during this study : (a) the Fernald School 


program and (b) the school enrichment, program. 
The Fernald School program differs from standard 


18 $ SEYMOUR FESHBACH AND HOWARD ADELMAN 


school experiences in (a) the high degree to which 
the classroom program is individualized and extra 
classroom supports are available, (b) the relatively 
low pupil-to-teacher ratio, and (c) the “special” 
qualities of the total environment, for example, an 
atmosphere of experimentation, emphasis on posi- 
tive reinforcement, and the reduction of school- 
related anxieties. An extensive discussion of the 
rationale underlying this program and of its char- 
acteristics is presented elsewhere (Fernald School, 
1969). 

The school enrichment program represented an 
attempted to provide a traditional, but high qual- 
ity, type of compensatory intervention program, 
The individualized remedial procedures were pat- 
terned after those employed at the Fernald School, 
but, of course, other features of the Fernald pro- 
gram could not be exported, and the time devoted 
to remediation had to be limited. Consequently, 
the primary focus was on improving reading and 
language skills during three to five hours per week. 

In the first year, the program essentially sup- 
plemented the regular teachers’ reading lessons. 
Each of three Fernald teachers took a group of 
three to four pupils for one hour per day, three 
days per week. However, since a goal of the project 
was to have a wider impact, during the second 
and third years Fernald School teachers took total 
charge of the reading programs of the participating 
pupils. That is, the teachers worked simultaneously 
with the entire school enrichment group from a 
particular school. It was felt at the time that such 
an approach might result in a youngster's regular 
classroom teacher diseussing other facets of his 
program with the Fernald teachers, thereby allow- 
ing them to present ideas for introducing individ- 
ualized instruction into the regular classroom pro- 
grams, (With reference to both remedial programs, 
it was recognized that pupils could well have bene- 
fited from participating for more than one year. 
Unfortunately, resource limitations prevented this 
course of action.) 


Measures 


Various combinations of measures were used 
during each phase of the project. The choice of 
measures was guided by our interest in achieve- 
ment gains and related intellectual and perceptual- 
cognitive functioning. Since no completely satis- 
factory instruments were available with reference 
to the variables that were of interest, we tended to 
choose instruments that had been found useful by 
other investigators. The instruments relevant to 
the present discussion, used one or more times dur- 
ing the three years of the project, are: 

1. California Achievement Test (Tiegs & Clark, 
1957, with norms revised in 1963) ; 

2. Wechsler. Intelligence Scale for Children 
(based on the work of Enburg, Rowley, and Stone 
[1961] and Carleton and Stacey [1954], a short form 
including six of the ten standard subtests was 
used) ; 

3. Auditory Discrimination Test (Wepman, 
1958); 


4. Visual Motor Gestalt Test (Bender, 1946); 

5. Frostig Developmental Test of Visual Per- 
ception (Frostig, Lefever, & Whittlesey, 1966) ; 

6. Vocational Checklist—boys’ form (Wright- 
stone, Forlano, Frankel, Lewis, Turner, & Bolger, 
1964) ; 

7. Ethnic Attitudes Instrument (Gerard, 1969) ; 

8. Locus of Control Scale (James, 1957). 

It should be noted that due to administrative 
policies in the Los Angeles City School District, 
some variations in procedure were required in ad- 
ministering these measures to the school enrich- 
ment and control groups, and, indeed, some in- 
struments could not be given at all to these two 
groups. Other reasons for changes in the 
ment procedures were (a) if a measure proved to 
be unreliable and to have limited utility and (b) if 
certain supplementary studies required the ad- 
dition of particular measures. 


ResuLTS AND DISCUSSION 


At the onset it was emphasized that while 
we have relied heavily on quantitative pro- 
cedures, our available measuring instru- 
ments could only capture a restricted 
segment of the behavior being assessed. 
Reading achievement tests reflect only & 
part of a child's achievement in reading; 
our measure of vocational aspiration only 
tapped the surface of the child's feelings 
about his vocational options and likely fu- 
ture. And there are domains of behavior 
that were not assessed at all, but that both 
teacher and pupil may perceive as signifi- 
cant. Clearly, the issues are complex; the 
variables involved are many; the methodol- 
ogy is imperfect. 

To partially compensate for the dryness 
and, more particularly, for the limitations 
of our quantitative analyses, we have in- 
corporated a number of qualitative observa- 
tions in this report. While the quantitative 
analyses, albeit in a limited way, can stand 
by themselves, the qualitative observations 
cannot, and therefore, are interpreted in 
conjunction with the numerical findings. 


Age and IQ Characteristics of the Samples 


Before reviewing the major experimental 
findings, it is helpful to consider some of the 
characteristics of our experimental samples. 
The number of subjects in each experimen- 
tal group who participated in at least one 
pre-post measure is presented by school. 
year and for all three years in Table 1. As i$ 
almost inevitable in a field study of this 
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TABLE 1 
Number or SunBjECTS IN Eacu EXPERIMENTAL 
Group Wuo PARTICIPATED IN AT LEAST ONE 
Pre-Posr MEASURE 


Fernald | Fernald | School 
Level advan- |disadvan-| enrich- | Control 
taged taged ment 
School year 1966-67 
Elementary 13 9 9 7 
Junior high 11 8 10 10 
School year 1967-68 
Elementary 9 9 15 16 
Junior high 11 11 9 9 
School year 1968-69 
Elementary 10 10 18 16 
Junior high 9 9 10 9 
All three school years combined 
Elementary 32 28 42 39 
Junior high 31 28 29 28 


kind, the number of subjects who completed 
the study differs from the number initially 
selected. 

In the advantaged group, there were actu- 
ally a few more subjects than the number 
that had been planned. These were added, 
largely in the first year, because they were 
available and also improved the matching. 

The mean ages and IQs of the subjects 
included in each experimental group are pre- 
sented in Table 2. The IQs of the advan- 


taged are higher than those of the disadvan- 
taged boys. This difference was anticipated, 
since a more liberal criterion was used in 
selecting disadvantaged children with at 
least “average” IQs. 

The basis for the differences in IQ among 
the disadvantaged elementary children is 
less clear. These children were randomly as- 
signed to the various experimental condi- 
tions and should have comparable IQ scores. 
One minor factor contributing to the relia- 
bly lower IQ of the disadvantaged elemen- 
tary group is the somewhat greater mean 
IQ of the children in the enrichment and 
control groups who remained in the project 
as compared to the children who were not 
available for posttesting. Another possible 
source of bias lies in the initial selection, 
While the children were randomly assigned 
to each group and while the great majority 
of families then agreed to send their child 
to the Fernald School, some substitutions 
may have been lower IQ children. However, 
as the subsequent analysis shows, the exper- 
imental effects found for the elementary age 
children were similar to those obtained for 
the junior high school subjects where there 
were no significant group differences in IQ. 

The initial IQ differences for the elemen- 
tary subjects are reflected in the pretest 
measures of achievement. The initial Cali- 
fornia Achievement Test mean grade place- 
ment scores of the Fernald Elementary dis- 
advantaged children was 2.71; the mean for 
the controls was 3.32; while the mean for 


TABLE 2 
MEANS AND STANDARD DEVIATIONS OF ÅGE AND IQ or EXPERIMENTAL GROUPS 
Advantaged Disadvantaged Enrichment Control 
pis 1Q Age Age IQ Age Q Age 
M (in months for age) 
: 91.5 117.4 94.3 115.5 96.5 116.9 
pes Be as i tf A k H e 
i i J 91.3 158.9 93.3 157.6 91.3 158.1 
es high e US 9 s uA js beh a 
SD 
10.7 13.3 9.7 13.9 
El t 8.6 12.4 8.7 13.2 
Jusonaae 6.9 11.9 7.5 7.0 8.6 8.2 8.8 7.9 
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TABLE 3 
CALIFORNIA ACHIEVEMENT Test TOTAL GRADE PLACEMENT 
Level Measure | A ana Head awed School enrichment Control 
Pretest M 

Elementary M 2.75 2.71 2.87 3.32 
SD .99 .90 1.28 379 

n 32 28 39 36 
Junior high M 5.98 6.15 5.76 5.50 
SD 1.06 1.11 1.34 1.56 

n 28 28 28 27 

Total M 4.26 4.43 4.08 4.25 
SD 1.92 2.00 1.98 1.60 

n 60 56 67 63 

Change score M 

Elementary M 1.08 1.06 .68 75 
SD 65 .49 73 47 

n 32 28 39 36 

Junior high M 1.04 1.10 57 .52 
SD .64 .46 AT .57 

n 28 28 28 27 

Total M 1.06 1.08 .63 .65 
SD 64 E .63 52 

n 60 56 67 63 


the controls who left the project was 2.96. 
While pretest differences were present at the 
elementary level, these did not materially 
influence the outcome of the study. Thus, 
there was a negligible relationship between 
initial level and amount of change (a rather 
surprising finding in view of the statistical 
regression effects). Second, comparable ef- 
fects were observed at the junior high level. 
Finally, special analyses were undertaken 
in which the effects of initial differences in 
pretest scores on subsequent posttest scores 
were eliminated through statistical proce- 
dures (covariance), and these analyses 
yielded results that were practically identi- 
cal to the comparisons of the amount of 
change displayed by each group. 


“These and the other statistical references are 
two-way (Condition X Age Group) analyses of 
variance and covariance performed using Biomedi- 
cal Computer Program BMD X64 "General Linear 
Hypothesis," written by Paul Sampson of the 
Health Sciences Computing Facility, University of 
California, Los Angeles. Specific comparisons 
among experimental groups were made as sub- 
analyses within the overall analyses. For further 
information on these procedures see Dixon (1969), 
Kempthorne et al. (1961), and Scheffé (1959). 


Group Changes and Differences 


Achievement testing. The first and most 
important empirical question to be consid- 
ered concerns the degree of movement in 
basic academic skills in the disadvantaged 
children who participated in the intensive, 
individualized, and socially integrated 
Fernald School program. The data on these 
children are contrasted with the academic 
progress of (a) the advantaged children en- 
rolled in the Fernald School program, (b) 
the disadvantaged children who participated 
in the more traditional compensatory 
(school enrichment) program, and (c) the 
control group of disadvantaged youngsters. 

The California Achievement Test was ad- 
ministered at the beginning and at the end 
of each academic year to all groups. This 
test consists of three components, reading 
arithmetic, and language skills (each of 
which has two subscales), and yields a total 
score descriptive of the child's overall grad: 
placement equivalent. The pretest and 
change score means of these Californi 
Achievement Test total grade placemen 
scores are presented in Table 3. 
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Reference has been already made to the 
IQ differences between the Fernald disad- 
vantaged and control groups at the elemen- 
tary level, and the statistically reliable dif- 
ference in initial grade placement scores 
is probably a reflection of this IQ difference. 
The mean IQs of the Fernald disadvantaged 
and control students are comparable at the 
junior high level; therefore, the differences 
in initial grade placement scores at the jun- 
ior high level (also statistically reliable) 
cannot be attributed to IQ differences. 

The use of the IQ enables us to predict at 
a much better than chance level a child’s re- 
sponse in different learning situations. At 
the same time, we know that children with 
the same IQ manifest markedly different 
learning patterns and that the same child’s 
performance varies greatly in different 
learning circumstances. From the beginning 
of our work with the disadvantaged chil- 
dren, we were struck with their responsive- 
ness to different school and class situations 
and the variation in their behavior under 
these different conditions. During the first 
day of testing, the children tested at their 
home schools were restless, defensive, non- 
conforming, and negativistic. The matehed 
group of children tested at the Fernald 
School, partieularly the junior high boys, 
were obliging, serious, and task oriented. 
These behavioral differences may well have 
influeneed the achievement test perform- 
ance. 

While the pretest differences are of in- 
terest and importance, the key data lie in 
the change score means presented in Table 
3. The movement of the two groups of chil- 
dren at the Fernald School is remarkably 
similar. At both the elementary and junior 
high level and for both advantaged and dis- 
advantaged samples, the increase in grade 
placement is about a year and a month. In 
contrast, the movement in both the enrich- 
ment and control groups was significantly 
less. The increase in the Fernald disadvan- 
taged group is significantly greater than the 
corresponding increase in the control and in 
the enrichment groups at both the elemen- 
tary and junior high levels (based on both 
analyses of variance and covariance). At 
the junior high level, the increase i 
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the children who attended the Fernald 
School is about twice the amount of. change 
in the school enrichment and control groups. 
The findings indicating that junior high 
level children can derive substantial bene- 
fits from a remedial program merits special 
attention in view of the widely held opinion 
that compensatory remedial educational ef- 
forts are relatively ineffective for this popu- 
lation and are best expended at earlier ages. 

These data then indicate the following: 

1. The disadvantaged children who at- 
tended the Fernald School and the advan- 
taged children at the Fernald School made 
increases in grade placement scores of 
slightly more than one year. 

2. The disadvantaged children who at- 
tended the Fernald School made signifi- 
cantly greater gains than either the enrich- 
ment or control groups. 

3. The relative advantage of the Fernald 
disadvantaged children over the other 
groups was most pronounced at the junior 
high level. 

4. The enrichment children did not make 
significantly greater gains than did the 
controls. 

This pattern of findings holds for many 
of the subtests of the overall achievement 
test scale. However, the differences in the 
disadvantaged groups in reading achieve- 
ment were largely due to the great gains 
made by the Fernald children in reading 
comprehension. (The children’s teachers felt 
that the Fernald and enrichment groups did 
make substantial gains in reading vocabu- 
lary, but that these gains were not reflected 
in the California Achievement Test meas- 
ure, which tends to sample middle-class 
rather than lower-class linguistic terms. 
This possible bias could be particularly 
acute because the individualized methods 
used emphasize, as one important source of 
new reading vocabulary, the concepts that 
the child employs in his speech and in his 
story writing.) 

With reference to arithmetic achievement 
totals, there was very little difference at the 
elementary level between the Fernald dis- 
advantaged group and the other two 
groups in changes on the arithmetic reason- 
i est, while the differences in arithme- 
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tic fundamentals were larger and consistent 
with the overall trend. At the junior high 
level, the gain in arithmetie fundamentals 
in the enrichment and control samples was 
negligible and significantly smaller than 
that of the Fernald disadvantaged group 
who showed a year's increment. The change 
in arithmetic reasoning in the Fernald dis- 
advantaged junior high group was particu- 
larly impressive. This group's mean gain of 
1.3 years was significantly greater than the 
gain of .9 years in the advantaged group and 
.7 and .6 years in the enrichment and con- 
trol groups, respectively. Since the skills en- 
tailed in arithmetie reasoning are of a higher 
conceptual order than the more rote content 
of arithmetic fundamentals, the gain 
achieved by the Fernald disadvantaged 
children acquires special significance. (It 
should be reemphasized that the enrichment 
group was not given special instruction in 
the arithmetic area.) 

While the differences between the school 
enrichment and control groups on the total 
language scale scores were slight, the disad- 
vantaged children at the Fernald School at- 
tained the highest scores of all the groups, 
significantly different from those of the en- 
richment groups and from those of the con- 
trols, although not from those of the ad- 
vantaged children. Again, the differences at 
the junior high level were larger than those 
at the elementary level. For the younger 
children, the difference between the Fernald 
disadvantaged students and the controls 
was statistically reliable, but that between 
the Fernald and the enrichment group failed 
to achieve statistical significance. While the 
control group made the least gain in spell- 
ing, the increments in the other groups were 
not much larger. None of the differences 
were statistically reliable, and they con- 
tributed in only a minor way to the differ- 
ences obtained on the total language meas- 
ure. The main source of these differences 
was the large increment obtained by the 
Fernald disadvantaged children on the Eng- 
lish mechanics subtest. At the elementary 
level, the Fernald disadvantaged children 
increased a little more than 1.1 years, and 
at the junior high level, they made a gain 
of 1.4 years in English mechanics, The latter 


gain was significantly greater than that 
achieved by either the enrichment or control 
groups. At the elementary level, these differ- 
ences only attain the .10 level of signifi- 
cance. 

Intelligence testing. Six subtests of the 
Wechsler Intelligence Scale for Children 
(WISC) comprehension, vocabulary, arith- 
metic, similarities, picture arrangement, and 
block design, were administered at the be- 
ginning of the year, and due to limited 
availability of testing time, especially for 
the enrichment and control groups, the first 
three of these subtests were given again at 
the end of the academic year only for the 
second and third years of the project. The 
numbers of boys in each experimental group 
used in analysis of these data are therefore 
fewer than the numbers available for the 
achievement comparisons, and reliable re- 
sults are more difficult to obtain. The pre- 
test and change means for the three re- 
peated subtests are presented in Table 4. 

While the differences in comprehension 
pretest means are not statistically reliable, 
the higher initial score of the control ele- 
mentary group is consistent with the higher 
initial scores attained on the achievement 
measures. The change scores on this sub- 
test are quite variable, and none of the dif- 
ferences are statistically significant. As an 
incidental note, the fact that the elemen- 
tary and junior high groups have compar- 
able scores should not be interpreted to 
mean that their absolute performance was 
the same. The numbers in the pretest table 
are weighted scores based on the age of the 
child as well as his performance. (A 
weighted score of 10 on each subtest is 
equivalent to an IQ of 100; an average of 7 
on each is equal to an IQ of 94; and an 
average of 8 is 96. The standard deviation 
of each seale is 3.) 

The vocabulary scores on the pretest of 
the advantaged children are significantly 
higher than those of the disadvantaged 
groups, while the differences among the lat- 
ter are not statistically reliable. The superi- 
ority of the advantaged children on the vo- 
cabulary subscale of the WISC is consistent 
with the results of other studies comparing 
the linguistic repertoire of advantaged and 
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TABLE 4 
WECHSLER INTELLIGENCE SCALE ror CHILDREN: COMPREHENSION, 
VOCABULARY, AND ARITHMETIC SUBTESTS 
- Fernald advantaged |Fernald disadvantaged | School enrichment Control 
es! 
M SD | n u | so |» x | so |» M SD | n 
Pretest M 
Comprehension 
Elementary 7.89 | 2.79 | 19 | 8.42 | 1.39 | 19 | 8.81 | 2.96 | 32 | 9.84 | 2.69 | 32 
Junior high 9.22 | 2.16 | 18 | 8.06 | 1.73 | 18 | 8.78 | 3.10 | 18 | 7.94 | 2.21 | 18 
Vocabulary 
Elementary 10.67 | 2.11 | 18 | 8.74 | 1.73 | 19 | 9.47 | 2.33 | 32 | 9.53 | 1.74 | 32 
Junior high 9.95 | 1.68 | 19 | 8.11 | 1.33 | 19 | 8.94 | 1.66 | 18 | 8.28 | 1.60 | 18 
Arithmetic 
Elementary 8.63 | 2.43 | 19 | 7.58 | 1.98 | 19 | 9.34 | 2.86 | 32 | 8.94 | 2.83 | 32 
Junior high 7.47 | 1.54 | 19 | 8.00 | 1.60 | 19 | 8.83 | 2.26 | 18 | 8.00 | 1.64 | 18 
Change score M 
Comprehension 
Elementary 1.42 | 3.22 | 19 | 1.26 | 2.13 | 19 | .53 | 2.72 | 32 | .41 | 2.67 | 32 
Junior high —.17 | 2.28 | 18 | 1.11 | 2.52 | 18 | -67 | 2.54 | 18 | 1.56 | 2.04 | 18 
Vocabulary 
Elementary .22 | 1.80 | 18 | .84 | 1.83 | 19 | 1.16 | 2.38 | 32 | 1.31 | 2.52 | 32 
Junior high 1.05 | 1.54 | 19 | .37 | 1.54 | 19| .00| .91| 18| .78 | 1.83 | 18 
Arithmetic 
Elementary —.11 | 2.60 | 19 | 2.00 | 1.91 | 19 | .00 | 2.74 | 32 | .59| 2.80 | 32 
Junior high .63 | 1.57 | 19 | 1.32 | 1.80 | 19 |—.28 | 1.18 | 18 |—.39 | 1.46 | 18 


disadvantaged youngsters; the special fea- 
ture of these data is that, both advantaged 
and disadvantaged samples were drawn 
from learning problem populations and were 
equated for severity of learning deficit. 
Although there appear to be some sizable 
differences in amount of vocabulary change, 
the vocabulary fluctuations are very vari- 
able and none of these differences are sig- 
nificant. The possible cultural bias of the 
vocabulary scale may make it relatively in- 
sensitive to vocabulary increments in dis- 
advantaged populations. ] 
No such limitation applies to the arith- 
metic subscale of the WISC. However, there 
are pretest differences on this measure that 
could have an influence on the change scores. 
These differences occur largely at the ele- 
mentary level, the pretest mean of the 
Fernald disadvantaged children being sig- 
nificantly lower than that of either of the 
other disadvantaged groups. This pretest 
difference is, in part, a consequence of the 
fact that some of the duller students left 


the enrichment and control samples during 
the course of the study, thereby elevating 
the mean score of the remaining children. 
Regardless of these initial differences, at 
both the junior high and elementary levels, 
the Fernald disadvantaged boys show a sig- 
nificant increase in arithmetic performance, 
which is reliably greater than that achieved 
by the other disadvantaged groups (based 
on both analyses of variance and covari- 
ance). Also, the increment is significantly 
greater than the change in the advantaged 
elementary school boys. The gains mani- 
fested by the Fernald disadvantaged groups 
on this arithmetic subscale can be viewed as 
an increment in IQ. The obvious connection 
between these changes and the increments 
found on the arithmetic achievement sub- 
tests points to the more general relationship 
between IQ and achievement and the often 
arbitrary distinction made between these 
two concepts. 

To determine the differences between the 
advantaged and disadvantaged groups, the 
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scores from the six subtest short forms of 
the WISC were used. The means are some- 
what lower than in the case of the IQ data 
previously reported for the entire sample, 
but the size of the differences between the 
advantaged (elementary group = 96.5; 
junior high = 95.4) and disadvantaged 
(elementary = 89.5; junior high = 89.7) 
groups is comparable to the data cited in 
Table 4 and may be explained in the same 
way. Turning to the subtest scores, the 
largest differences between the Fernald dis- 
advantaged and advantaged boys, and the 
only individually reliable ones, were on the 
vocabulary and similarities subscales (p < 
0005 and p < .005, respectively), both of 
which entail a high verbal factor. (While 
the comprehension subtest is also a matter 
of verbal understanding, it does not require 
verbal definition as is the case for vocabu- 
lary and similarities.) However, since the 
sample is small and the array of measures 
is limited, it would be premature to con- 
clude that intellectual differences between 
advantaged and disadvantaged learning 
problem populations lie primarily in the 
area of verbal proficiency. 

Perceptual-cognitive testing. The Wep- 
man Auditory Discrimination Test and the 
Bender Gestalt measure of perceptual- 
motor funetioning were administered at the 
beginning and at the end of the academic 
year in order to determine whether changes 
in conceptual, academic skills were accom- 
panied by systematie changes at the per- 
ceptual level. These tests were given only to 
the elementary age children, since they are 
not appropriate to older age groups except 
where brain damage or related nervous 
system malfunctioning is suspected. The 
means of the pretest and change error scores 
for both tests are presented in Table 5. 

The mean error score for the advantaged 
children on the Wepman is significantly 
lower than the mean error score obtained 
by each of the disadvantaged groups. All of 
the disadvantaged groups declined in error 
scores; this change, however, is not signifi- 
cantly different from the zero mean change 
score of the advantaged group. The pretest 
difference may well reflect a vocabulary 
difference rather than one of auditory ca- 


TABLE 5 
Werman AuprTORY DISCRIMINATION TEST AND 
BENDER VisuAL GEsTALT Test MEANS 
(ELEMENTARY STUDENTS ONLY) 


Meas- | Fernald | Fernald | School 
Test vé advan- |disadvan-| enrich- | Control 
taged taged ment 
Pretest M 

Wepman | M 1.89 3.49 3.83 | 3.54 
SD 1.49 2.74 2.00 | 2.60 

n 19 19 29 28 

Bender M 3.11 4.37 4.42 | 4.57 
SD 2.74 3.48 3.02 | 2.62 

n 18 19 31 28 

Change score M 

Wepman | M 0 —.79 | —1.34 | —.61 
SD 2.16 3.12 2.89 | 2.78 

n 19 19 29 28 
Bender | M —.61 | —.84 | —.77 |-1.07 
SD 1.69 3.56 2.62 | 2.73 

n 18 19 31 28 


pacity or “tuning out,” inasmuch as famili- 
arity with the words used in the Wepman 
would influence the error score. The Bender 
Gestalt data are minimally influenced by 
any verbal component. The child simply 
has to copy a figure, verbalizations only 
entering into the instruction for this test. 
The child’s productions were scored by the 
Koppitz method, higher scores reflecting 
more errors. Although none of the pretest 
differences are significant, the differences 
are in the same direction as obtained on 
the Wepman. (A similar result was obtained 
on the Frostig Developmental Test of Visual 
Perception.) If the three disadvantaged 
groups are combined and then compared to 
the advantaged children, the difference is 
statistically reliable. As in the case of the 
Wepman, there are no significant differences 
in change scores. 

These data, then, indicate that the ex- 
perimental program had no significant effect 
upon these perceptual-cognitive skills that 
have been linked to learning problems, 
particularly in reading. While one cannot 
conclude from these data alone that changes 
on the perceptual—cognitive level were ir- 
relevant to changes in academic skills, this 
inference is certainly a reasonable one. The 
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poorer performance of the disadvantaged 
as compared to the advantaged children re- 
quires additional analysis and supplementa- 
ton by other data before it can be ade- 
quately interpreted. 

Attitudinal testing. The present brief dis- 
cussion of our findings with regard to group 
attitudinal differences is limited to locus 
of perceived control and evaluation, voca- 
tional aspirations, and ethnic attitudes. The 
last two instruments were readministered at 
the end of the year to assess changes in 
attitudes. There is some evidence that dis- 
advantaged populations are more likely to 
perceive themselves as controlled by ex- 
ternal and accidental forces than are 
middle- and upper-income groups (a not 
necessarily inaccurate perception). More 
germane to the present study, the Coleman 
report suggests that one of the best predic- 
tors of the disadvantaged child’s response to 
special experiences is the extent to which 
he feels that he can control his own fate. At 
the beginning of the second and third years 
of the project, we administered the locus 
of control and locus of evaluation instru- 
ments. The former scale assesses the degree 
to which the child feels that he, himself, 
versus external forces has control over his 
behavior, while the latter focuses on the 
basis for his self-evaluation. Higher scores 
on these scales reflect greater feelings of 
autonomy, self-reliance, and self-control. 

We found clear evidence of an age fac- 
tor, the elementary boys obtaining reliably 
lower internalization scores on both instru- 
ments than the junior high boys. Second, 
we found very little difference at the ele- 
mentary level between the advantaged and 
disadvantaged groups, but a number of 
interesting differences emerge at the junior 
high level on the Locus of Control Scale. 
The advantaged group obtained signifi- 
cantly higher scores than did the controls 
(F = 4.6, p < .05) and also obtained higher 
Scores than did the enrichment group, al- 
though the latter difference fell short of 
statistical significance. What is most in- 
teresting, however, is the finding that the 
Fernald disadvantaged boys obtained reli- 
ably higher scores than those obtained by 
either the control (F = 6.6, p < .025) or 


enrichment groups (F = 4.3, p < .05). This 
Locus of Control measure was administered 
only a few weeks after the initiation of the 
experimental program, and these differences 
suggest that the simple exposure of the dis- 
advantaged boys to the Fernald School 
resulted in stronger feelings of autonomy 
and a greater acceptance of personal respon- 
sibility for one’s own performance and 
actions. 

' A measure of vocational aspirations was 
administered as one means of determining 
whether the Fernald and enrichment ex- 
periences produced any changes in the 
child's perception of the opportunities avail- 
able to him and the level of vocational goals 
he set for himself. The advantaged children, 
as might be expected, were found to have 
higher initial aspirations than the disadvan- 
taged boys. However, the differences were 
not large, reflecting perhaps the fact that 
the advantaged children perceive them- 
selves as having learning problems that 
limit their vocational possibilities. We also 
found that the junior high children have 
more ambitious vocational aspirations than 
their elementary counterparts, which is en- 
couraging since it suggests that the junior 
high boys have not become overwhelmed 
and completely discouraged by their school 
failures. 

The change scores revealed very few reli- 
able differences. The vocational measure 
reflected a lowered level of aspiration for all 
of the junior high groups on retesting. The 
experimental program, then, was not effec- 
tive in raising the aspirations of the Fernald 
junior high boys, despite the gains they 
made in academic skills. Perhaps the boys 
were only being more realistic on retesting. 
At the elementary level, the Fernald dis- 
advantaged children showed an elevation in 
aspiration reliably greater than the change 
in the advantaged subjects, but since they 
were the lowest group initially, the change 
was not large enough to bring them in line 
with the other groups. 

One of the questions that was of central 
interest to us concerned the effects of the 
integration experience upon the child’s per- 
ceptions of his own and other ethnic groups. 
Because of school policies, we used a rather 
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indirect procedure that entailed presenta- 
tion of photographs of Anglo (Caucasian), 
black and Mexican-American children and 
elicitation of judgments of characteristics of 
the children in the photographs. 

Both groups, in their initial judgments, 
gave higher rankings on kindness to the 
Anglo photographs, particularly at the 
junior high level. The Anglo photographs 
were also ranked much higher by both ad- 
vantaged and disadvantaged groups for 
“happiest boy" and boy who gets “best 
grades." On the dimension of best grades, 
the junior high age group gave considerably 
higher rankings to the Anglo stimuli than 
did the elementary boys, whether advan- 
taged or disadvantaged. 

It is only on the traits denoting physical 
skills that the Anglo stimulus boys were 
given lower ranks than the black and Mexi- 
can-Ameriean stimuli. The Mexican-Ameri- 
cans were judged as the strongest especially 
by the advantaged boys, with the black 
stimuli falling close behind. The judgments 
of "fastest boy" were much the same, with 
blacks and Mexican-Americans receiving 
similar ranks and the Anglo boys seen as 
less fast than the others, especially by the 
elementary groups. 

What is particularly striking about these 
data is the extent to which the Anglo ad- 
vantaged children and the largely black 
disadvantaged children shared a common 
conception of the relative attributes of 
Anglos, blacks, and Mexican-Americans. 
In terms of the child’s overall self-image, it 
would be interesting to know the relative 
importance of these traits for the advan- 
taged and disadvantaged child. 

The ethnic attitude change scores were 
not very illuminating and, in some respects, 
were rather disappointing. There were very 
few significant differences between the 
Fernald disadvantaged group and the en- 
richment and control groups in the degree 
and direction of change shown. In their 
rankings of “kindest boy,” the Fernald dis- 
advantaged elementary group increased 
their ranking of the black stimuli while at 
the same time lowering the rankings of the 
Anglo photographs. The corresponding 
changes in the elementary enrichment and 


control groups were directly opposite in 
direction. Again, at the elementary level, 
both Fernald groups saw the black child 
as happier on retesting than did the enrich- 
ment and control groups. However, the 
differences were reliable ouly for the Fer- 
nald advantaged group comparisons. Also 
at the elementary level, the Fernald dis- 
advantaged children lowered their rankings 
of the Anglo stimuli in judging fastest boy, 
while elevating the rankings of the Mexican- 
American and black stimuli in compensat- 
ing for this shift. There were no significant 
differences among the various groups in 
the changes observed in their rankings of 
boy with best grades and “strongest boy.” 

These data provide some evidence of a 
positive change in the Fernald disadvan- 
taged elementary children in the way in 
which they view members of their own 
ethnie group. However, this measure failed 
to reflect any reliable changes in the Fer- 
nald disadvantaged junior high level boys 
as compared to the other two disadvantaged 
groups. This is the only instance in which 
the disadvantaged junior high boys attend- 
ing the Fernald School displayed weaker 
experimental effects than their elementary 
counterparts. 


Summary AND CONCLUSIONS 


There are several clear-cut findings that 
emerge from the data, other findings that 
form a trend consistent with the principal 
results, and still other data that are only 
suggestive or ambiguous. The major ex- 
perimental finding is clearly the increase in 
achievement observed in the disadvantaged 
children attending the Fernald School and 
the failure of the enrichment program to 
exert an influence significantly greater than 
that provided by the control experience. 
The California Achievement Test findings 
are buttressed by the qualitative perform- 
ance of the Fernald disadvantaged group 
and by a significant increment on the 
arithmetic subtest of the WISC. These 
effects are generally stronger for the junior 
high group than for the elementary group. 

One of the cognitive areas that did not 
reflect any experimental effects was vocabu- 
lary. Possible reasons for this have already 
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been discussed. In addition, the improve- 
ments noted in achievement test perform- 
ance were not aecompanied by significant 
changes in perceptual skills. The disadvan- 
taged elementary group did show initial 
deficiencies in the latter area and did im- 
prove in performance, but those changes ap- 
peared to be unrelated to the experimentally 
produced changes in the more complex 
basic school skills. 

The analyses of attitudinal changes 
yielded sporadic findings which, when sig- 
nificant, were consistent with the achieve- 
ment test findings. On the whole, however, 
from these data it would appear that pro- 
found or systematic changes in these affec- 
tive areas did not take place as a result of 
either the Fernald or the enrichment ex- 
periences. 

When the disadvantaged youngster is 
compared with the advantaged learning dis- 
order population, a number of cognitive dif- 
ferences emerge—in vocabulary and, for 
the younger age group, on the perceptual 
tasks, At the same time, there are striking 
areas of similarity between the two groups 
on other measures. 

In summary, the quantitative findings 
and the qualitative observations suggest 
that black children, both the young adoles- 
cent and the elementary-school-age child 
from economically disadvantaged areas and 
with marked deficits in school achievement, 
can derive significant benefits from major 
variations in school inputs, for example, 
major reorganization of classroom, school 
structure, and atmosphere as contrasted 
with the kind of piecemeal intervention so 
characteristic of Title I programs. These 
results offer one small piece of evidence to 
combat the increasing pessimism concerning 
the efficacy of compensatory educational 
programs for economically disadvantaged 
blacks and other minorities, a pessimism 
that we suggest is based upon an indiscrimi- 
nate analysis of Title I and related pro- 
grams, disregarding the variables that dis- 
tinguish effective and ineffective programs. 
At the same time, the data indicate that 
labels such as “culturally disadvantaged" 
and "learning disordered" do not encompass 
discrete diagnostic behavioral categories 


and can obscure important functional simi- 
larities between children assigned these 
diverse labels. The learning problems of an 
economically advantaged child require for 
their resolution substantial remedial efforts 
over a period usually measured in years. 
Should the learning problems presented by 
the economically disadvantaged child re- 
quire any less?5 
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ATTENTION AND READING ACHIEVEMENT IN 


FIRST-GRADE BOYS AND GIRLS 
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University of Minnesota 


A behavior observation schedule was utilized to investigate sex dif- 
ferences in classroom attentiveness and the relationship of such 
attentiveness to reading achievement among first-grade children, 74 
boys and 58 girls. Girls were found to be significantly (p < .01) more 
attentive than boys and to achieve higher word recognition scores 


(p < .05). Further, word recognition was found to be significantly 
(p < .01) related to attentiveness for the group as a whole, with read- 
ing readiness controlled in a covariance analysis. This latter finding 


replicates previous results with 


fourth- and sixth-grade pupils but 


demonstrates that the relationship obtains with beginning readers 


before a history of academic succe: 


ss-failure has been established. 


'The purpose of this study was to deter- 
mine whether there are sex differences in 
classroom attentiveness and to determine if 
attention (ie. visual orienting behavior or 
direction of gaze [Turnure, 1970, 1971]) is 
related to reading achievement. 

Numerous studies in reading have re- 
ported that in naturalistic classroom set- 
tings, reading achievement of girls is su- 
perior to that of boys (Dykstra, 1967; 
Gates, 1961). However, in the laboratory, 
where attentional behavior of the subject is 
more easily controlled, sex differences in 
reading-analogous paired-associate learning 
are not found (Jeffrey & Samuels, 1967; 
Peterson, 1972). There have been some 
demonstrations that the performance of 
boys is superior to that of girls on a reading- 
type task under certain classroom conditions 
(McNeil, 1964); in the McNeil study, 
orienting behavior was controlled through 
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pupils and teachers of the Minneapolis public 
school system. We also acknowledge the contribu- 
tions of Karen Anderson, Bridgitte Schroeder, and 
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Sharon Larsen, and Teara Archwamety, who as- 
sisted with data processing analysis. 
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Samuels, Minnesota Reading Research Project, 
Center for Research in Human Learning, Uni- 
versity of Minnesota—Elliott Hall, Minneapolis, 
Minnesota 55455. 


placing students in cubicles and utilizing 
headphones for audio input. 

When asked why students succeed in a 
particular subject matter, teachers tend to 
attribute suecess to teacher-controlled var- 
iables such as methodology. On the other 
hand, academic failure is generally ex- 
plained by reference to intrastudent var- 
iables, such as lack of intelligence, readiness, 
motivation, or attention (Baldwin, Johnson, 
& Wiley, 1970). 

A recent study by Lahaderne (1968) re- 
ported that school achievement at Grade 
6 was related to attention. A similar finding 
was reported by Cobb (1972) for fourth- 
grade pupils. However, when measure- 
ments come this late in the child’s academic 
career, one cannot be sure if inattentiveness 
may not represent avoidance behavior 
resulting from academic failure. That is, 
lack of school success may lead to inatten- 
tiveness rather than the reverse, since lack 
of reinforcement in school settings generally 
leads to extinction of task-relevant be- 
haviors. 

By replicating Lahaderne's study in 
Grade 1, we hoped to determine whether 
attentiveness was related to academic 
achievement (i.e., reading) prior to the 
effects of long-term success—failure school ex- 
periences. In addition, the study was de- 
signed to determine if the expected superior 


reading achievement of girls was related 
to observed attentiveness in the classroom. 


in a reliability of 8995, which is similar to those 
reported by Cobb (83%) Lahaderne (83%-100%). 


30 S. JAY SAMUELS AND JAMES E. TURNURE | 


METHOD 


Subjects 


Eighty-eight first graders, 53 boys and 35 girls, 
were observed. The subjects were obtained from 
four classrooms in two middle-class schools in the 
Minneapolis school system. The teachers in these 
schools used traditional three-group reading 
methods and basal reader materials. 


Procedure 


An observer was assigned to each of the four 
classrooms to record the attentional behaviors of 
the pupils during the reading hour. Attention was 
defined and measured in a manner similar to that 
reported by Lahaderne (1968). Task-relevant be- 
haviors (i.e. orienting eyes to text or teacher, 
working on reading follow-up exercises, observing 
chalkboard or overhead projection, or otherwise 
following the instruetional directions of the 
teacher) were scored as positive instances of at- 
tentiveness. Negative attentiveness consisted of 
nontask-orienting behavior such as failing to 
follow instructional directions, closing eyes, 
working or playing with nonassigned materials, 
ete. 

Attentive and inattentive behaviors were re- 
corded on a scoring sheet that listed the children's 
names according to their reading groups. Each 
child was observed in sequence in accordance with 
his listing on the scoring sheet. A six-second scor- 
ing method was used. A child was observed for 
four seconds, and in the next two seconds a plus 
(+) or minus (—) was entered on the scoring sheet 
representing the observer's judgment of task 
attentiveness or inattentiveness, respectively. 
Question marks signified ambiguous instances in 
which it was unclear to the observer whether or 
not the pupil was attentive. It was possible to 
record 600 observations during each reading hour. 
Observers made 15 visits to each classroom over 
the course of a month. The attentional data used 
for analysis was a proportion score comprised of 
the number of positive instances divided by the 
total number of both positive and negative in- 
stances; question marks were excluded. 


Reliability 


A videotape was made of first-grade children 
grouped around a table doing reading follow-up 
work. A 15-minute segment of the film was selected 
for training purposes, and a 2-minute segment not 
previously exposed was retained for the test of 
interrater reliability. 

Using the six-second scoring method described 
earlier, observer reliability was calculated by 
dividing the total number of agreements by the 
total number of recorded behaviors. This resulted 


Reading Achievement Measure 


Reading achievement was measured by pre- 
senting 45 words, randomly selected from the Dolch 
(1956) list of basic sight words, for recognition, 
Each of these words was typed onto a 3 X 5 inch l 
card with a primary typewriter. The words were 
presented individually and the student was given 
up to six seconds to respond. The experimenter 
recorded all correct responses on a score sheet. No 
feedback was given on the test. 


Data Analysis 


Reading readiness scores were available and 
were used as a covariate in the analysis of word 
recognition. These readiness scores were also used 


in analyzing the data by means of partial correla- 


tions. Attention scores were placed in four quar- 
tiles: Qı = .68 and below; Q, = .69-.80; Qs = .8l- 
.87; and Q, = .88 and above. A subject, for ex- 
ample, was placed in quartile Q; if he was attentive 
68% of the time or less. Attention and sex served 
as the two independent variables in the analysis of 
covariance of word recognition scores. 


REsuuts 


Sex Differences in Reading Readiness 
and Attention 


The mean reading readiness score for the 
boys was 67.43 (SD = 18.14), and the mean 
for the girls was 64.34 (SD = 27.18). Com- 
paring these two means by ¢ tests indicated 
that the difference was not significant (t < 
1, df = 86). 

Comparing the two sexes on attention 
indicated that the mean attention score for 
the boys was .76 (SD = .13) and the mean 
for the girls was .84 (SD = .10). This dif- 
ference in attention was significant in favor 
of girls (¢ = 3.08, df = 86, p < .01). 


Attention and Reading Achievement 


Table 1 shows the mean word recognition 
scores for each of the attention categories 
and for the two sexes. Inspection of the table 
indicates that as attention increases, there is 
a corresponding increase in word recognition. 
Also, females have a higher recognition 
Score in comparison to males. A Sex X. 
Attention analysis of covariance of these 
Scores, using reading readiness as the covar- 
iate, found the following: a significant main 
effect for attention (F = 8.46, df = 3/79, P 
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TABLE 1 
Mean Word RECOGNITION SCORES AND STANDARD 
DEVIATIONS FOR ÁTTENTION 
CATEGORIES AND SEX 


Attention category Sex 
Measure 
Q: Q: Qi a Male | Female 
x 16.18] 25.05} 26.83] 33.87| 22.68) 30.03 
SD 13.04] 10.24) 18.38} 13.49] 13.74) 13.68 


Note. The four quartiles used are as follows: 
Qı = .68 and below; Q: = .69-.80; Qs = .81—87; 
and Q, = .88 and above. 


< .001) and a significant main effect for 
sex (F = 3.96, df = 1/79, p < .05). The 
Sex X Attention interaction was not sig- 
nificant (F < 1). 

Newman-Keuls tests were computed to 
determine which of the word recognition 
scores were significantly different from each 
other for each of the four attention cat- 
egories. 'The following was found: All com- 
parisons were significantly different from 
each other (p « .05) with the exception of 
Q: and Qs. 

In order to facilitate comparability from. 
the present study to the reports of Lahaderne 
(1968) and Cobb (1972), correlations be- 
tween attention and word recognition were 
computed. A Pearson product-moment cor- 
relation of .44 was computed (p « .01). 
This correlation is close to the value found 
between attention and reading achievement 
by Cobb (r — .45) for the fourth grade and 
by Lahaderne (r = .51-.39) for sixth graders. 
The partial correlation in this study, con- 
ducted between the variables of attention 
and word recognition and controlling for 
reading readiness, was .44. 


Discussion 


In this study, it was found that girls were 
significantly superior to boys in word rec- 
ognition, as had been previously reported, 
and significantly superior in classroom at- 
tentiveness as well. We also found that in- 
creasing degrees of attention were related to 
superior word recognition. Thus, like 
Lahaderne, and more recently Cobb and 
Hops (1973), we found that overt task- 
relevant orienting behavior was related to 
Scholastic achievement; furthermore, this 


relationship was obtained in beginning 
reading before a long history of academic 
failure had been established. 

Environmental factors have an important 
effect on children’s scholastic achievement. 
In Germany, where male teachers pre- 
dominate, boys were found to be superior 
in reading achievement (Preston, 1962). 
MeNeil (1964) reported that in a reading- 
type task, boys learned more than did girls 
when the learning situation was designed 
to approximate a mechanical manipulative 
activity, but there was a reversal in achieve- 
ment when these children were taught 
reading in the classroom by women. Sex 
differences in verbal learning are not found 
in laboratory studies in which attention is 
carefully controlled. 

One educational implication of the present 
study, as well as the studies reported by 
Lahaderne (1968) and Cobb (1972), is that 
contrary to teachers’ opinions that only low 
achievement is related to intrastudent 
variables (Baldwin et al., 1970), both high 
and low reading achievement appear to be 
related to attention. Another implication 
would be that the sex difference favoring 
girls frequently found in reading achieve- 
ment seems to be mediated by an attentional 
variable. Behavior modification literature 
indicates that orienting behavior can be 
controlled by the teacher (Packard, 1970; 
Walker & Buckley, 1968) and that such 
control of attention can be conducive to 
higher academic achievement (Cobb & 
Hops, 1973). Since the time of James 
(1890), psychologists have believed that 
attention is the sine qua non for learning. 
It should be pointed out to teachers that 
reading achievement has been found to be 
related to attention. Instructional success 
clearly requires that teachers secure and 
maintain the attention of all their pupils. 
In light of the finding that teachers tend to 
attribute academic failure to intrastudent 
variables, the viewpoint expressed by 
Goldiamond and Dyrud seems appropriate: 


The performance of the student may be to a 
considerable extent a function of the procedures 
used to establish that behavior; we should look to 
deficits in our own procedures before ascribing 
deficits to the students or difficulty to the problem 


[1966, p. 99]. 
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CONCEPT-LEARNING PERFORMANCE 
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This experiment investigated relationships between anxiety and 
concept learning. Two hypotheses were examined: (a) A-State would 
have a stronger debilitating effect on concept-learning performance 
than A-Trait and (b) task conditions would affect A-State. Sixty-one 
undergraduate subjects were randomly assigned to either high- or low- 
ego-involving conditions. All subjects performed extradimensional- 
shift concept-learning tasks. The performance of high-A-State subjects 
was significantly inferior to that of low-A-State subjects, while there 
were no differences in performance between high- and low-A-Trait 


subjects. Furthermore, high-ego-involving instructions increased A- 
State and task completion reduced A-State for these subjects. These 
data support Spielberger's state-trait interpretation of anxiety. 


Research investigating the relationship 
between anxiety and concept learning has 
produced inconsistent results and it has been 
proposed that factors such as task complex- 
ity, ego-involving conditions, and cognitive 
abilities be examined in an effort to clarify 
these results (Denny, 1966; Meyers & Dun- 
ham, 1972). Differentiating between trait 
and state anxiety might also help to produce 
more consistent research in this area, since 
there is extensive evidence that measures of 
anxiety that are more specific to the situa- 
tion relate more strongly to learning per- 
formance than general measures. For ex- 
ample, Sarason’s research (Sarason & Pal- 
ola, 1960) suggests that test anxiety has a 
stronger relationship to performance on 
testlike tasks than trait anxiety. Similarly, 
Spielberger’s (1966) research indicates that 
State anxiety (feelings of apprehension and 
heightened autonomic nervous system ac- 
tivity) is more likely to relate significantly 
to learning performance than trait anxiety 
(a generalized predisposition to experience 
state anxiety). 

The drive theory interpretation of anxiety 
(Spence & Spence, 1966) has been employed 


*Requests for reprints should be sent to Joel 
Meyers, Department of School Psychology, Temple 
University, College of Education, Philadelphia, 
Pennsylvania 19122. 


to explain the relationships between anxiety 
and concept-learning performance (Denny, 
1966; Maltzman, Fox, & Morrissett, 1953). 
According to this position, the learning per- 
formance of high-anxious subjects will be 
inferior to that of low-anxious subjects on 
tasks with competing responses (i.e., incor- 
rect response tendencies that are higher in 
the habit hierarchy than correct response 
tendencies). On tasks low in competing re- 
sponses (ie., where the correct response is 
dominant in the habit hierarchy relative to 
incorrect responses), it is predicted that per- 
formance of high-anxious subjects will be 
superior to that of low-anxious subjects. 
However, it is not easy to assess the effects 
of anxiety in terms of drive theory, because 
it is difficult to determine accurately the 
number of competing responses. 

Perhaps drive theory could be more ade- 
quately evaluated by using extradimen- 
sional-shift concept-learning tasks, since it 
may be possible to estimate the number of 
competing responses associated with these 
tasks. This type of problem involves an 
unannounced shift in the solution, as the re- 
sponses that are originally learned are in- 
correct in the final portion of the task. Con- 
sequently, the incorrect response tendencies 
should be higher in the habit hierarchy than 
the correct responses. 


34 JOEL MEYERS AND ROY MARTIN 


In one previous investigation using this 
type of task, Meyers and Dunham (1972) 
found no significant main effects for anxiety 
on concept-learning performance. However, 
these researchers measured trait rather than 
State anxiety. For the reason outlined above, 
it is hypothesized that state anxiety would 
be a more adequate measure of anxiety as a 
drive. Consequently, the present investiga- 
tion is an extension of the one reported by 
Meyers and Dunham in that state anxiety 
was assessed in addition to trait anxiety. 

Whereas much important work investi- 
gating the relationship between anxiety and 
learning has been oriented to determining 
the effects of anxiety on learning perform- 
ance, recent investigations have also begun 
to determine effects of the task on anxiety 
experienced by subjects. For example, 
Martin (1970) reported that qualifying ex- 
aminations for doctoral candidates created 
substantial levels of state anxiety and that 
the degree of anxiety inereased as the date 
of the exam approached. In addition, Martin 
and Meyers (1972) have shown that the 
level of state anxiety increased as the date 
of a final examination became more proxi- 
mate. Finally, O'Neill, Spielberger, and 
Hansen (1969) found that the level of diffi- 
culty of a task had a significant effect on 
the state anxiety experienced by subjects. 

In view of the literature cited regarding 
the effects of state and trait anxiety on con- 
cept-learning performance and the literature 
regarding the effects of task conditions on 
state anxiety, the purpose of this investiga- 
tion was twofold: (a) to determine whether 
state anxiety would have a stronger debili- 
tating effect on concept-learning perform- 
ance than trait anxiety and (b) to determine 
whether certain aspects of the experimental 
situation would have a significant effect on 
the level of state anxiety experienced by 
subjects. 


Meruop 


Subjects 


Subjects were 61 introductory educational psy- 
chology students at the University of Texas. They 
were randomly assigned to two experimental 
treatments, which consisted of either high- or low- 
ego-involving instructions (adapted from Sarason, 


1956). There were 32 subjects in the high-involve- 
ment condition and 29 subjects in the low-involve- 
ment condition. Originally, there were 66 sub- 
jects; however, 5 subjects who did not complete 
the task were not included in the analysis. 


Problem 


Unidimensional concept-learning tasks were used 
in this experiment. There were two samples fol- 
lowed by two experimental problems, and these 
tasks were graded in difficulty. The first sample 
had only one dimension and it was relevant to 
the solution; the second sample contained three 
dimensions with one dimension relevant to solu- 
tion; and the experimental problems consisted of 
five dimensions with one relevant to solution of 
the task. 

The experimental-concept task was an extra- 
dimensional-shift task involving negative transfer, 
which consisted of two consecutive unidimensional, 
four-category problems with five dimensions. The 
first of these two problems is referred to as the 
original-learning problem, and the second problem 
is referred to as the transfer-learning problem. 

A deck of 1024 3 X 3 inch cards was derived 
from all possible combinations of the words found 
in Table 1 with the restriction that every card 
contain one word from each of five dimensions. 
These five dimensions included tools, fruit, vehicles, 
clothes, or furniture. There were four possible 
words representing each dimension, and these are 
the words found in Table 1. The placement. of 
dimensions on each card was random so that no 
dimension would become associated with a par- 
ticular position. The cards were placed in a random 
order, and to insure a random sequence of stimuli, 
starting points were chosen randomly for each 
subject. 

While looking at a series of the above stimulus 
cards, one at a time, the subject's task was to de- 
termine into which of four possible response cate- 
gories (1, 2, 3, or 4) each observed card belonged. 
In reference to Table 1, either tools, fruit, vehicles, 
or clothes was the dimension relevant to solution 
on the original-learning problem. For example, 
if tools was the relevant dimension, the subject 
may have had to learn that hammer — 1, pliers — 
2, nails — 3, and saw — 4, and in this instance the 


TABLE 1 
STIMULUS MATERIALS IN THE CONCEPT-LEARNING 
PROBLEM 
Dimension 
Tools Fruit Vehicles | Clothes | Furniture 
hammer | cherry | plane hat bed 
pliers banana | train shoes couch 
nails apple car shirt table 
saw pear boat pants | chair 
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remaining four dimensions (fruit, vehicles, clothes, 
and furniture) would be irrelevant to the solu- 
tion of the problem. 

The same deck of cards was used in the transfer- 
learning problem. While either tools, fruit, vehicles, 
or clothes was the dimension relevant to solution 
of the original-learning problem, the transfer prob- 
lem involved a shift to a different dimension (fur- 
niture) as the basis to solution of the problem. 
Therefore, on the transfer-learning problem, sub- 
jects had to learn different response associations 
such as sofa — 1, table — 2, desk — 3, and chair — 4. 


Procedure 


All of the data were collected during individual 
testing sessions (ie. one session per subject). 
During each session the subject was asked to fill 
out a short form (State Anxiety 1) of the Spiel- 
berger State Anxiety Inventory followed by the 
Spielberger Trait Anxiety Inventory (Spielberger, 
Gorsuch, & Lushene, 1970). The short form of the 
state anxiety inventory was used so that when state 
anxiety was measured during the task, the mea- 
surement would produce minimal interference with 
performance. 

After completing the trait anxiety inventory, 
the experimenter explained the nature of the task 
to the subject. Then the two sample problems were 
presented. Each sample and the experimental 
problem were administered by placing one stimu- 
lus card at a time on the table in view of the sub- 
ject. After the subject indieated which of the four 
response categories (1, 2,3, or 4) he thought was ap- 
propriate for each card, the experimenter indicated 
whether the response was correct or incorrect. 
When the response was wrong, the experimenter 
stated the correct answer. 

Following a criterion of eight consecutive cor- 
rect responses on each of the two samples, all 
subjects received one of the following sets of in- 
structions designed to produce either high or low 
involvement with the task. 


High ego-involvement: 


Do I have your name spelled correctly? How 
old are you? What is your major? What 
classes are you in? What is your grade point 
average? Do you expect to get a Bachelor's 
degree? The first problem was a practice 
problem. Now we will begin the next prob- 
lem. This is similar to a short-form intelli- 
gence test in that it involves the ability to 
think in abstract terms. Pay close attention 
to each card since each one missed lowers your 
Score when it is compared with other people 
your age. 


Low ego-involvement: 


Your performance was fine on this problem. 
Now I am going to give you another practice 
problem. What concerns me is not how well 
you do, but rather the characteristics of the 


problem which are uncovered. This is a com- 
mon procedure in this sort of experiment. 


The second measure of state anxiety (A-State 
2) was administered immediately following these 
instructions, and then the experimental concept- 
learning task was begun by presenting the original- 
learning problem. When a criterion of 13 consecu- 
tive responses was reached, there was an un- 
announced shift in the solution and the transfer- 
learning problem was begun. Rather than one of 
the other four dimensions, furniture was now the 
dimension relevant to solution. Performance was 
terminated when a criterion of 13 consecutive re- 
sponses was reached on the transfer problem. 
State anxiety was assessed at three additional 
points during the task, producing a total of five 
A-State measures: early during the original-learn- 
ing problem (A-State 3), early during the transfer 
problem (A-State 4), and late in the performance 
of the transfer problem (A-State 5). 


RESULTS 


Effects of Anxiety on Performance 


There were two treatment groups derived 
from the two ego-involving conditions, and 
multiple linear regression procedures were 
used to determine the relationship between 
each anxiety variable and performance in 
these two conditions. The number of errors 
on the original- and on the transfer-learning 
problems were the two measures used to 
assess performance. 

The regression analyses included two 
steps. First, an F test was computed to de- 
termine whether the relationship between 
anxiety and performance was significantly 
different in the two treatment conditions. 
Since no significant differences were found 
between treatments, the subjects from these 
two conditions were combined into one 
group and a second analysis was completed 
to determine whether there was a significant 
relationship between anxiety and perform- 
ance. This relationship was defined in terms 
of the slope of the regression line of anxiety 
on performance, and an F test was used to 
determine whether the slope was signifi- 
cantly different from zero. 

Original learning. The first three measures 
of state anxiety were administered prior to 
or during the early portions of the original- 
learning problem, and consequently it was 
predicted that these three state anxiety 
measures would have significant debilitating 
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TABLE 2 
MULTIPLE LINEAR REGRESSION ANALYSIS 


iginal-learni; Transfer-learnii 
Origin: learning siy e, 


problem 
Anxiety measure 
Fe n P n 
Trait anxiety .10 .002 | 1.82 | .030 
State 1 8.09***| .121 | 3.17* | .051 
State 2 4.74** | .074 | 2.90* | .047 
State 3 4.15** | .066 | 2.55 | .041 
State 4 1.95 .032 | 1.90 | .031 
a df = 1/59. 
Sp «10 
sp < .05, 
at gi at) be 


effects on performance of the original-learn- 
ing problem. Table 2 reveals that this pre- 
diction was confirmed, as A-States 1, 2, and 
3 were all significantly related to perform- 
ance of the original-learning problem. In 
contrast to state anxiety, it was predicted 
that trait anxiety would not have a signifi- 
cant relationship to performance, and Table 
2 reveals that this expectation was also con- 
firmed. 

The relationships between anxiety and 
performance of the original-learning prob- 
lem are depicted graphically in Figure 1 in 
order to help describe these results more 
clearly. For example, it is clear that there is 
no relationship between A-Trait and per- 
formance, as the number of errors is essen- 
tially the same for subjects one standard 
deviation above the mean (high A-Trait) 
and for subjects one standard deviation be- 
low the mean (low A-Trait) in A-Trait. 

In contrast to trait anxiety, Figure 1 re- 

veals that the significant relationships be- 
tween the first three measures of state 
anxiety and original-learning performance 
were all clearly debilitating. Subjects scor- 
ing one standard deviation above the mean 
in state anxiety (high A-State) had signifi- 
cantly more errors on the original-learning 
problem than subjects scoring one standard 
deviation below the mean in state anxiety 
(low A-State). Consistent with predictions, 
Figure 1 demonstrates that there were de- 
bilitating relationships between A-State and 
original-learning performance that were not 
found for A-Trait in this study. 


Finally, the fourth measure of state anxi- 
ety was administered during the early por- : 
tions of the transfer-learning problem, after 
the original-learning problem had been com- 
pleted. Consequently, a significant debilitat- 
ing relationship between this variable and 
performance of the original-learning prob- 
lem was not expected. Table 2 revealed that 
this expectation was also confirmed. 

Transfer learning. The comparisons be- 
tween state anxiety and trait anxiety are 
less clear during the transfer-learning prob- 
lem. Similar to original learning, Table 2 
reveals that trait anxiety did not relate sig- 
nificantly to performance of the transfer 
problem. However, Table 2 also reveals that 
there were no significant, relationships to 
transfer performance for any of the meas- 
ures of A-State, although the relationships 
for A-States 1 and 2 did approach signifi- 
cance. While not significant, the direction 
of these relationships was such that they 
indicated debilitating effects for A-State and 
A-Trait on transfer learning. 


Factors Affecting State Anziety 


In addition to investigating the effects of 
anxiety on concept-learning performance, 
an equally important goal was to examine 
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the effects of the experimental situation on 
the state anxiety experienced by subjects. 
Specifically, it was felt that task perform- 
ance, task intervals (ie., the point in time 
during the task when A-State is assessed), 
and the ego-involving instructions would 
have some effects on state anxiety. 

The relationships between task perform- 
ance and state anxiety were examined with 
a Pearson product-moment correlation be- 
tween performance of the transfer-learning 
problem and subjects’ scores on the fifth 
measure of A-State. Since this fifth measure 
was taken when the task was almost fin- 
ished, it was assumed that performance of 
the transfer task had an effect on the level of 
anxiety rather than the reverse. This was the 
rationale for using a correlational analysis 
rather than the multiple regression proced- 
ures used with the preceding four measures 
of A-State. As anticipated there was a mod- 
erate relationship (r = .35, p < -01), which 
was negative in the sense that poor per- 
formance was associated with high anxiety 
and that performance improved as anxiety 
decreased. However, it was found that this 
relationship was dependent on the ego-in- 
volving instructions. 

Specifically, for subjects with high-ego- 
involving instructions the negative relation- 
ship between performance and state anxiety 
was predictably high (r = .49, p < .01). On 
the other hand, an unexpected finding was 
the lack of relationship between perform- 
ance and A-State for subjects in the low- 
ego-involving condition (r = .14, ns). 

A repeated-measures analysis of variance 
was computed in order to further delineate 
the effects of the experimental situation on 
A-State. More specifically, this analysis was 
designed to consider the effects of both ego- 
involving instructions and task intervals on 
A-State. Consequently, the two groups of 
subjects receiving high- or low-ego-involv- 
ing Instructions formed the two treatment 
conditions, and the five administrations of 
the A-State questionnaires accounted for 
trials, as these anxiety scores were the de- 
pendent measures. 

There was a significant main effect for 
trials on the state anxiety measure (F — 
5.04, df = 4/236, p < .001), and this indi- 


cated that A-State inereased following the 
ego-involving instructions, maintained a 
high level throughout task performance, and 
decreased as completion of the task ap- 
proached. 

There was also a significant Ego Involve- 
ment x Trials interaction (F = 3.76, df = 
4/236, p < .01), and this finding provides 
an important elaboration to the main effect. 
Consistent with the above results, an in- 
spection of the means in Table 3 reveals 
that under high-ego-involving instructions, 
state anxiety increased subsequent to the 
instructions, maintained this high level 
throughout the task, and decreased as task 
completion approached. On the other hand, 
for subjects receiving low-ego-involving in- 
structions, Table 3 reveals that there was es- 
sentially no change in A-State. Specifically, 
for these subjects there was no increase in 
A-State following the instructions, and there 
was no decrease in A-State as completion 
of the task approached. Consequently, both 
the correlational data and the repeated- 
measures analysis of variance support the 
notion that task conditions have an impact 
on state anxiety and that ego-involving in- 
structions can modify the relationships be- 
tween the task and state anxiety. 


DISCUSSION 


One important goal of this investigation 
was to continue recent attempts to broaden 
anxiety research by considering task per- 
formance and experimental conditions as 
they affect state anxiety (Martin, 1970; 
Martin & Meyers, 1972; O’Neill et al., 
1969), rather than limiting questions to the 


TABLE 3 


Group MEANS AND STANDARD DEVIATIONS FOR 
STATE ANXIETY 


State 


Involvement 


1 2 3 4 5 
High 
M 7.8 9.7 9.7 9.5 8.8 
SD 2.5 3.5 3.1 3.5 3.1 
Low 
M 8.8 8.8 8.8 8.9 8.7 
SD 2.9 3.1 2.9 3.3 3.1 
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effects of anxiety on learning performance. 
This newer area of research is becoming 
more feasible as instruments such as Spiel- 
berger's State Anxiety Inventory continue 
to be developed, and the available research, 
including the present study, indicates that 
this is a productive direction for further in- 
vestigation. 

Specifically, in this study there were com- 
plieated relationships between the experi- 
mental situation and A-State. It was ex- 
pected that the ego-involving instructions 
would have an impact on A-State, and this 
occurred in two ways. First, as expected, 
high-ego-involving instructions increased A- 
State; whereas low-ego-involving instruc- 
tions did not have this effect. Second, 
low-ego-involving instructions consistently 
eliminated the effects of task on state anx- 
lety. For example, as was predicted from 
past research (Martin, 1970; Martin & 
Meyers, 1972), there was a negative rela- 
tionship between task performance and 
A-State. However, this relationship occurred 
for subjects with high-ego-involving instruc- 
tions and not for those with low-ego-involv- 
ing instructions. Similarly, for subjects with 
high-ego-involving instructions there was a 
decrease in anxiety as completion of the 
task approached, and this confirms prior 
results (O'Neill et al., 1969). However, this 
effect was not obtained for subjects receiv- 
ing low-ego-involving instruetions in the 
present study. 

It is potentially important that the effects 
of the task on A-State were obtained under 
relatively threatening conditions and elimi- 
nated when less threatening conditions were 
employed. An implication of this finding is 
that there may be ways to control the anx- 
iety-provoking aspects of testlike situations 
by altering the instructions given for the 
test. Research would, of course, be necessary 
to confirm and elaborate this finding. 

The second goal of the present investiga- 
tion was to determine whether state anxiety 
would be more strongly related to perform- 
ance of a concept-learning task than trait 

anxiety. The significant relationships be- 
tween state anxiety and performance of the 
original-learning problem combined with a 
lack of significant relationships between 


trait anxiety and performance confirmed 
this expectation and provided strong support, 
for Spielberger's state-trait interpretation of 
anxiety. 

The results for state anxiety were in the 
predicted direction; that is, the perform- 
ance of high-anxious subjects was inferior 
to that of low-anxious subjects, and this 
supports the Spence interpretation of anx- 
iety as a drive (Spence & Spence, 1966). 
However, the transfer problem was designed 
to provide the strongest test of drive theory, 
since the unannounced shift in solution of 
the task was expected to create high num- 
bers of competing responses. State anxiety 
did not significantly relate to transfer per- 
formance, so the predictions of drive theory 
were not fully supported. 

While the results obtained during transfer 
learning are not supportive of drive theory, 
additional work would be needed to assess 
drive theory in this context. In order to pro- 
duce a satisfactory test of drive theory un- 
der these conditions, it would have been 
necessary to have a second transfer condi- 
tion to serve as a control, This second con- 
dition would be one in which an entirely 
new set of stimulus cards containing new 
dimensions and values was used on the 
transfer problem. In this situation, the sub- 
ject would be aware of a new problem begin- 
ning, and it would be less likely that the 
previously learned responses would compete 
with performance. The test of drive theory 
would be to determine whether the relation- 
ship between state anxiety and performance 
would be relatively more debilitating on the 
first transfer condition than on this second. 
transfer condition. Such an investigation is 
currently in progress. 

It is important to consider the present 
investigation as one of a series of studies at- 
tempting to clarify past research that has 
inconsistently reported debilitating effects 
of anxiety on concept learning. The thrust 
of this series of research is to demonstrate 
that a consideration of cognitive aptitudes 
and state anxiety would help to produce 
more consistent results. Presently, two of 
these studies have been completed using the 
Same concept-learning task, and as expected, 
when trait anxiety has been considered by 
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itself, there have been no signifieant rela- 
tionships to concept-learning performance. 
On the other hand, when trait anxiety was 


considered in conjunction with several apti- ` 


tude measures (Meyers & Dunham, 1972) 
and when state anxiety was considered in 
the present investigation, significant debili- 
tating relationships between anxiety and 
concept-learning performance have been 
found. Consequently, there is preliminary 
evidence suggesting the importance of con- 
sidering aptitudes and state anxiety in re- 
search investigating the relationships be- 
tween anxiety and concept learning. 
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Subjects were randomly assigned to one of four treatment groups: 
position (before or after) and type (lower order vs. higher order) 
of question placed in prose material. A control group had no questions 
in text. Before instruction, subjects received five aptitude tests. 
Instruction consisted of a 1,525-word prose passage. Immediately 
after and two weeks following instruction, subjects were tested. Dif- 
ferences in group means on four measures of achievement usually were 
Statistically significant but small. A vocabulary test interacted with 
treatments. Subjects with low vocabulary scores might be assigned to 
text material with higher order questions placed after a prose passage, 
while subjects with high vocabulary scores might be assigned to text 


without questions. 


The concept of mathemagenic activities 
(Rothkopf, 1965, 1970) has focused atten- 
tion on intended and incidental learning as 
a consequence of adjunct questions inserted 
within prose material. The effect of question 
position (before vs. after prose passages) 
has been examined thoroughly. In general, 
a consistent facilitative learning effect oc- 
curs when questions are inserted after rather 
than before prose passages (Frase, 1968; 
Rothkopf, 1966; Rothkopf & Bisbicos, 
1967). More recently the effects of the type 
of question (e.g., application, synthesis, or 
higher order question vs. factual or lower 
order question) have been explored (Allen, 
1970; Hunkins, 1968; Tenenberg, 1969; 
Watts & Anderson, 1971). For example, 
Hunkins’ data support the hypothesis that 
higher order questions prompt more thor- 


* Requests for reprints should be sent to Richard 
J. Shavelson, who is now at the Graduate School of 
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ough study and cognitive reorganization of 
the material, while lower order questions 
influence attention to facts. Similarly, Watts 
and Anderson found that application ques- 
tions, as contrasted to questions that re- 
peated examples or questions on names of 
significant people, yield the greatest general 
facilitative effect on learning by prompting 
the student to inspect and comprehend the 
text more thoroughly. Carroll (1971), in à 
review of this area, speculated that “ques- 
tions are most effective when they not only 
cause memory search, but also cause some 
sort of reorganization of memory traces and 
associations [p. 164].” 

When mean differences between treatment 
groups are examined, the effects of position 
and type of question are statistically sig- 
nificant, though usually small. But suppose 
that for particular subjects with similar 
Scores on personality or ability measures, 
the effects of these variables are quite differ- 
ent from the effects for some other group of 


POSITION AND TYPE OF QUESTION AND PROSE LEARNING 41 


subjects in the treatment. Such effects can- 
not be detected when a mean is calculated 
for all subjects in the treatment. A certain 
combination of treatment variables, say 
higher order questions inserted after a pas- 
sage, may result in a facilitative learning 
effect for a particular subgroup of subjects. 
For different subjects these treatment varia- 
bles may not be facilitative, though such an 
effect could occur in some other treatment. 
These are hypotheses about aptitude by 
treatment interactions, or simply, ATI (see 
Berliner & Cahen, 1973; Cronbach & Snow, 
1969). 

Two ATI studies that are related to the 
present investigation have been completed. 
Berliner (1971) examined the effects of 
factual questions placed at specified inter- 
vals in a lecture presentation. In many 
analyses, for subjects with low scores in 
memory ability, questions inserted after 
lecture segments facilitated learning as 
measured by short-answer immediate- and 
delayed-retention tests. For subjects with 
high scores in memory ability, the questions 
may have interfered with learning. In a 
prose-learning study, Hollen (1971) found 
disordinal interactions between associative 
memory and questioning treatments. When 
the treatment did not contain adjunct ques- 
tions, the need for associative memory was 
maximized (i.e., there was a positive slope 
for the regression of a learning measure on 
memory ability). However, when adjunct 
questions constituted the treatment, the 
need for associative memory was minimized 
(ie. there was a negative slope for the re- 
gression of a learning measure on memory 
ability). To optimize treatment effects, then, 
subjects with low scores on memory ability 
might be assigned to the adjunct-question 
treatment and subjects with high scores 
might be assigned to a question-free treat- 
ment. 

The Berliner and Hollen studies raise 
questions about the main effects of adjunct 
questions inserted in instructional material. 
The purpose of this study is to extend the 
examination of interactions between a sub- 
ject's aptitudes and instructional treatments 
that differ in position and type of questions 
inserted in prose material. 


METHOD 


Subjects 


Eighty-seven volunteer subjects from a local 
junior college were placed at random into one of 
five groups: (a) LB = lower order questions before 
text (n = 18); (b) LA = lower order questions after 
text (n = 21); (c) HB = higher order questions be- 
fore text (n — 20); (d) HA — higher order ques- 
tions after text (n = 13); and (e) C = control, no 
questions in text (n — 15). The proportion of male 
subjects ranged between 22 in Group LB and 45 
in Groups HB and C. The mean age ranged be- 
tween 21.3 years in Group LB and 242 years in 
Group C. 


Materials 


The instructional material, entitled “The Lisbon 
Earthquake,” reported on the earthquake in Lis- 
bon, Portugal, in 1755 and the historical and philo- 
sophical events that accompanied the quake. The 
material was selected for its novelty, reading level, 
reading time, reported interest, and its correspond- 
ing test items. Kropp, Stoker, and Bashaw (1966) 
constructed test items on “The Lisbon Earth- 
quake” and had high school subjects read the text 
and answer the questions. Their data analysis 
showed that, in general, the empirical classifica- 
tion of items agreed with an a priori classification 
based on Bloom (1956). From the empirically de- 
termined taxonomic classification of test items, 
the questions for this study were selected. Lower 
order questions required subjects to demonstrate 
knowledge: “The size of the tidal wave which hit 
the Lisbon Harbor Area was: (a) 30 feet; (b) 40 
feet; (c) 50 feet; (d) 60 feet.” Higher order ques- 
tions required subjects to demonstrate comprehen-, 
sion, application, and analysis (cf. Bloom, 1956): 
“A ‘mental seismograph’ is a (a) scientific device for 
detecting ideas; (b) figure of speech for the mind; 
(c) mental record; (d) mechanical device for re- 
cording earthquakes." 

The 1,525-word text was divided into eight sec- 
tions, the first. seven consisting of two paragraphs 
each and the last section of four short paragraphs. 
For the four experimental groups, a lower or higher 
order multiple-choice question was inserted either 
before or after each section of text. Thus, a total 
of eight questions was inserted in the text for 
each experimental group. For Groups LB and HB, 
questions were inserted before each section of text 
and repeated, with the correct answers, at the end 
of the section. For Groups LA and HA, questions 
were placed after each section of text and repeated, 
with the correct answer, on the following page. The 
control group read the text without inserted ques- 
tions and answers. 

Questions were assigned to the text and the 
achievement test in the following way. A pool of 
32 multiple-choice questions was created by in- 
cluding 2 lower order and 2 higher order questions 
from each of the text's eight sections. For the 
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lower order question treatment, 8 lower order ques- 
tions were randomly assigned to both the text and 
the achievement test. The remaining 8 lower order 
questions appeared on the achievement test only. 
The 16 higher order questions were distributed in 
the same way. The achievement test, then, used all 
32 items. It contained 8 lower order questions that 
appeared in the text, 8 higher order questions that 
appeared in the text, and 16 questions that were not 
used in the text. 


Instruments 


Five aptitude measures were administered. 
"Three of these tests were from the battery collected 
by French, Ekstrom, and Price (1963). The Ad- 
vanced Vocabulary Test (Part 2) measures verbal 
comprehension. It is a test of the ability to un- 
derstand the English language. The Hidden Fig- 
ures Test (Part 1) measures "the ability to keep 
one or more definite configurations in mind so as 
to make identifications in spite of perceptual dis- 
tractions [p. 9].” The Advanced Vocabulary and 
Hidden Figures Tests may be considered tests of 
different general ability (intelligence) factors. Since 
different treatments placed differential emphasis on 
verbal comprehension and the ability to hold ideas 
in mind during instruction, these tests might be 
expected to interact with the treatments. The third 
test, Letter Span (Part 1) measures “the ability 
to recall perfectly for immediate reproduction a 
series of items after only one presentation of the 
series [p. 26].” This test was chosen because the 
ability measured had previously been shown to 
interact with questioning treatments (Berliner, 
1971; Hollen, 1971). 

"The fourth test was the Taylor Manifest, Anxiety 
Scale (Taylor, 1951). In this study, the test, was 
labeled “Biographical Inventory." It positions sub- 
Jects on an anxious to nonanxious continuum, It 
was included because a pilot study showed that 
high-anxious subjects performed better on a learn- 
ing measure when questions preceded text material, 
whereas low-anxious subjects performed better 
when questions followed text material. Presumably 
the structure provided by presenting the questions 
before reading a passage attenuated their anxiety 
and permitted them to concentrate on learning the 
material. 

: The fifth test, memory for semantic implica- 
tions, was constructed specifically for this study. 
This test was designed to reflect Guilford's (1967) 
description of an ability in his structure-of-intellect 
model. The test purports to measure the ability 
to remember and transform information presented 
in written material. The need for such a test was 
based on a psychological analysis of the cognitive 
requirements for learning prose material with ad- 
junct questions. An answer to a higher order ques- 
tion seemed to require the ability to remember 
and transform the information in the text. This 
theorizing corresponds to Carroll’s (1971) hy- 
pothesis about the effects of cognitive reorganiza- 
tion noted above. The reliability (Kuder-Richard- 


son Formula 21) of the specially constructed test 
was 85. 

The achievement test has already been de- 
scribed. Most of the items were taken from Kropp 
et al. (1966). Eight new lower order questions were 
constructed because of the need to have equal 
numbers of questions corresponding to each passage 
in text. The correlation between achievement. test 
scores for immediate and retention testing, a rough 
index of test-retest reliability, was .71. 


Procedure 


The experiment was conducted over three one- 
hour sessions. In the first session, the experimenter 
explained the sequence of the study, assured con- 
fidentiality of test results, prompted the subjects 
to do their best, and explained that the purpose of 
the study was “to investigate how people learn, 
particularly how they learn from written ma- 
terials.” Following the introduction, test, packages 
containing the five aptitude tests were distributed. 
Subjects were instructed to write their name, age, 
and sex on the package. The testing sequence was: 
Advanced Vocabulary (4 minutes); Biographical 
Inventory (74% minutes); Hidden Figures (10 
minutes); a rest break (4 minutes); Letter Span 
(5 minutes); and memory for semantic implica- 
tions (7 minutes). 

The second session was conducted one week 
later. Instructional materinls corresponding to the 
LB, LA, HB, HA, or C conditions had been ran- 
domly ordered and were distributed to subjects at 
this time. Upon completing the study of the ma- 
terials, subjects took the achievement test (im- 
mediate posttest). 

The third session was conducted two weeks after 
the second session. The achievement test was ad- 
ministered again (retention testing). 

After all data were collected and the preliminary 
analysis was made, the experimenter returned to 
the junior college and discussed the study with 
interested subjects. 


REsuLTS AND Discussion 


Measures for Examining Learning 


The achievement test contained 32 items 
classified into four groups: (a) lower order 
questions that, appeared in text, (b) higher 
order questions that appeared in text, (c) 
lower order questions that did not appear in 
text, and (d) higher order questions that did 
not appear in text. In addition to a total 
Score, scores on the four groups of questions 
can be combined in several ways to investi- 
gate learning. For example, if scores on 
lower and higher order questions that did 
not appear in text (“no text”) are com- 
bined, a measure of incidental learning is 
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formed. If scores are calculated for lower 
order questions/no text and higher order 
questions/no text separately, measures of 
transfer from type of question in text to the 
same type of question are formed for lower 
and higher order question groups, respec- 
tively. Finally, if scores on test questions 
that also appear in text are calculated, a 
measure of intentional learning is obtained. 
Intercorrelations among these measures 
are presented in Table 1, (higher order 
groups) and Table 2 (lower order groups). 
Most are positive and significantly different 
from zero. The highest are part-whole cor- 
relations, in which the items on one measure 
constitute a portion on another. For ex- 
ample, half of the items forming the total 
score constitute the incidental-learning 
score, The lowest intercorrelations are be- 
tween transfer and intentional scores. These 
measures are independent of one another. 
Further, some subjects may be at ceiling on 
the intentional measure (see Table 3), and 
this restriction of range may be responsible 
for the lower correlations. Finally, note the 
difference in the correlations between in- 
tended and incidental measures in the two 
tables. This suggests that practice on and/or 
inspection of text for lower order questions 
does not facilitate answering questions on 
incidental material (see Table 2). This is 
not the ease when higher order questions are 
used in text (see Table 1); they seem to 
facilitate learning of incidental material. 


Effect of Placement and Type of Question 
on Learning: Comparison of Means 


In Table 3, means? for each measure are 
reported by treatment group and time of 
testing (immediate and retention). Data on 
the number of subjects within each group 
and the standard deviations, especially for 
the total score, indicate that the assumption 
of homogeneity of variance in statistical 
tests will be violated. Thus, although all the 
tests are performed at « = .05, the exact 
level of significance cannot be specified. 
Therefore, the tests of significance and the 


? Additional descriptive data such as standard 
deviations and numbers of subjects are available 
from the first author. 


TABLE 1 
INTERCORRELATIONS BETWEEN MEASURES: 
HIGHER ORDER GROUPS 


Score 


1. Total 

2. Incidental 
3. Transfer 

4. Intended 


*p< .05. 


Johnson-Neyman (1936) analyses reported 
below should be interpreted with caution. 
Also, in these analyses, all comparisons be- 
tween means are more conservative than 
would be the case if sample sizes were the 
same for each treatment. 

Total score. Total score data were ex- 
amined with a Treatment Group X Time 
of Testing (5 X 2) analysis of variance 
with repeated measures on the time factor. 
Means are presented in Table 3. The group 
effect was significant (F = 3.37, df = 4/65). 
Post hoc comparisons of group means using 
the Newman-Keuls method for unequal ns 
showed that Group HA scored significantly 
higher than Group LB. Other pairwise com- 
parisons showed no reliable differences. The 
time effect, as anticipated, was significant 
(F = 28.48, df = 1/65). Scores were higher 
at immediate testing (X = 23.04) than at 
retention testing (X = 21.06). The Group x 
Time interaction was not significant (F < 
1.0). Orthogonal contrasts were performed 
to test the effects of (a) type of question 
(lower order vs. higher order), (b) position 
of question in text (before vs. after), and 
(c) their interaction. Reliable differences 
were not found. Nevertheless, the means 
as presented in Table 3 show a generally 


TABLE 2 


INTERCORRELATIONS BETWEEN MEASURES: 
Lower ORDER GROUPS 


Score 


1. Total 

2. Incidental 
3. Transfer 

4. Intended 


*p < 05. 
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TABLE 3 
Mean SCORES FOR ALL MEASURES 
Lower order Higher order Lower order Higher order 
Measure (maximum score) | questions before questions before questions after questions after Control 
text text text text 
Total score (32) 
Immediate 20.69 23.50 23.00 25.31 23.00 
Retention 19.59 21.56 20.06 24.30 21.00 
Incidental learning (16) 
Immediate 9.50 11.05 9.75 11.31 11.36 
Retention 9.12 10.11 8.50 11.40 10.69 
Lnr 
Transfer: Lower order/no text (8) 

Immediate 5.31 5.85 5.60 6.31 6.21 
Retention 4.94 5.22 4.69 6.30 5.70 
Transfer: Higher order/no text (8) 

Immediate 4.06 5.20 4.15 5.00 5.14 
Retention 4.18 4.89 3.81 5.00 5.00 
Intentional learning (8)* 

Immediate 7.38 7.40 7.15 7.00 5.83 
Retention 5.59 6.31 6.28 6.80 5.16 


* For the four question groups, intended learning is caleulated from scores on questions inserted in 
text. For example, the score for the group lower order questions before text is calculated from scores 
on low-order items appearing in text. Since the control group did not receive questions in text, the 
mean for both high- and low-order questions combined is reported as a baseline for comparison. 


facilitative effect for higher order questions 
rather than for lower order questions and 
lor questions after rather than questions 
before. These data, though not significant, 
conform to the findings in the literature re- 
viewed above. 

Incidental-learning measure. Incidental 
learning scores were examined with a Treat- 
ment Group X Time of Testing (5 x 2) 
analysis of variance with repeated measures 
on the time factor. Means are presented in 
Table 3. The group effect was significant 
(F — 3.00, df — 4/65). Pairwise compari- 
sons showed no reliable differences. The 
time effect was significant (F — 13.49, df — 
1/65). Scores were higher at immediate test- 
ing (£ = 9.81). The Group X Time inter- 
action was not significant (F « 1.0). Or- 
thogonal contrasts were performed to test 
the effects of (a) type of question, (b) posi- 


tion of question, and (c) their interaction. 
Reliable differences were not found. The 
trends among the means in this analysis 
conform to the findings on question position 
and type. 

Transfer-learning measure. Means for the 
transfer scores are presented in Table 3. 
Two separate Treatment Group X Time of 
Testing (3 X 2) analyses of variance with 
repeated measures on the time factor were 
performed on transfer scores for the lower 
order plus control and higher order plus 
control groups. The only main effect found 
was time of testing for the lower order 
groups plus control (F — 8.66, df — 1/40). 
A transfer effect from a certain type of 
question placed in text to the same type of 
question never before seen does not occur. 

Intentional-learning measure. Means for - 
the intentional scores are presented in Table 
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3. Two separate Treatment Group X Time 
of Testing (3 x 2) analyses of variance 
with repeated measures on the time factor 
were performed on the intentional-learning 
scores for the lower order plus control and 
higher order plus control groups. In both 
analyses, a significant group main effect was 
obtained (F = 3.52, df = 2/40; F = 6.05, 
df = 2/36; respectively). As expected, ex- 
posure to questions placed in text produces 
performance that is near ceiling on the same 
items on the criterion test. 

In the discussion of the results from the 
four learning measures, the performance of 
the control group cannot be overlooked. Ex- 
cept for Group HA, Group C usually per- 
formed as well as or better than the other 
experimental groups. Although an explana- 
tion for this finding is not readily available, 
it may involve the short length of the prose 
material. 

Aptitude by treatment interactions. If the 
five treatment conditions have similar ef- 
fects on all individuals and if achievement 
test scores are regressed on aptitude scores, 
the regression slopes for treatments should 
be parallel and the difference among slopes 
can be explained by the differences in means 
among the groups. But if the treatment 
groups do not have similar effects on all 
individuals (i.e., persons high on Aptitude A 
do well with higher order questions placed 
after text, but not with these questions 
placed before text), the regression slopes for 
treatments should not be parallel. Rather, 
they should interact and perhaps cross at 
some point. To determine if aptitude by 
treatment interactions were present, total- 
and incidental-learning scores were re- 
gressed on aptitude scores. Transfer- and 
intentional-learning measures were not ex- 
amined due to the restricted range (0-8 
points) and ceiling effect for these data. 
Separate analyses of variance of scores from 
each aptitude test showed that the groups 
did not differ significantly on these varia- 
bles. Except for the correlations between the 
Advanced Vocabulary Test and the Bio- 
graphical Inventory (r = —.22) and the 
Hidden Figures Test and the memory for 
semantic implications (r = .23) , intercorre- 


lations among aptitude tests were not signifi- 
cantly different from zero. 

A computer program (Dowaliby & Ber- 
liner, 1971) using the Johnson-Neyman 
(1936) technique with the Potthoff (1964) 
modification tested the data for interactions. 
For each set of analyses, the hypothesis of 
a common slope (achievement regressed on 
aptitude for every possible pairwise com- 
bination of treatment groups) is tested (a = 
.05). For significant interactions, the John- 
son-Neyman technique is applied such that 
a region of nonsignificance is determined 
(a = .10). At this point the alpha level 
should be determined by the decision maker, 
since it depends on the type of risk he is 
willing to take in classifying students. Cases 
falling within the region of nonsignificance 
may be assigned to either treatment; cases 
falling outside this region should be as- 
signed to one or another treatment. 

This analysis did not support the hypoth- 
esized interaction between the memory abil- 
ity tests (Letter Span and memory for se- 
mantic implications) and treatments. Also, 
the Biographical Inventory and the Hidden 
Figures Test did not interact with treat- 
ments. 

Total score. Advanced Vocabulary Test 
scores interacted with treatment at immedi- 
ate and retention testing (see Figures 1 and 
2). A disordinal interaction appeared with 
immediate-testing data. The regression slope 
for Group HA differs significantly from the 
slope for Groups LA and C. Subjects in 
Groups LA and HA with Advanced Vocabu- 
lary scores above 8.60 may be assigned to 
either treatment. Subjects with scores below 
8.60 on Advanced Vocabulary should be as- 
signed to the HA treatment. Fifty-six per- 
cent of the subjects in Groups LA and HA 
should be assigned to Group HA to optimize 
their achievement. Subjects in Groups HA 
and C with Advanced Vocabulary scores 
above 8.09 may be assigned to either treat- 
ment. Subjects with scores below 8.09 should 
be assigned to the HA treatment. Sixty- 
seven percent of the subjects in Groups HA 


2The ATI data on these measures are available 
from the first author. 
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— — LOWER-ORDER BEFORE (r = -.17, b = -.32) 

— Jp LOWER-ORDER AFTER (r = .65, b = .69) 

—A—  MIGHER-ORDER BEFORE (r= .31, b = .45) 

Lp  HIGHER-ORDER AFTER (r = -.14, b = -.12) 
CONTROL (r = .64, b = .87) 


TOTAL TEST SCORE 
(IMMEDIATE TESTING) 


ADVANCED VOCABULARY TEST SCORE 


Ficure 1. Interaction of advanced vocabulary scores with treatment effects as measured 
by the achievement total score at immediate testing, 


and C should be assigned to the HA treat- from the slope for Group LA, Subjects with 
ment. Scores above 8.98 on Advanced Vocabulary 

For retention-testing data, the regression may be assigned to either treatment. Sub- 
slope for Group HA differs significantly jects with scores below 8.98 should be as- 


i LOWER-ORDER BEFORE (r = 01, b = .02) 
——ÀK—— _LOWER-ORDER AFTER (r = .70, b = 84) 
NIGHER-ORDER BEFORE (r = .36, b = .55) 
NA HIGHER-ORDER AFTER (r = 430, b= ,34) 
CONTROL (r = .40, b = .30) 


TOTAL TEST SCORE 
(RETENTION TESTING) 


ADVANCED VOCABULARY TEST SCORE 


Ficure 2. Interaction of advanced vocabulary scores with treatment effects as measured 
by the achievement total score at retention testing. 


signed to the HA treatment. Fifty-seven 
percent of the subjects in Groups HA and 
LA should be assigned to the HA treatment. 

Incidental-learning measure. Advanced 
Vocabulary also interacts with treatment 
using the incidental-learning measure taken 
at immediate and retention testing. At im- 
mediate testing, subjects in Groups LA and 
and HA (8 = .45 vs. B = —.09) with apti- 
tude scores below 5.27 should be assigned to 
the HA treatment; subjects with scores 
above this value may be assigned to either 
treatment. Nine percent of the subjects in 
Groups LA and HA should be assigned to the 
HA treatment. Subjects in Groups HA and C 
(8 = —.09 vs. B = 47) with Advanced 
Vocabulary scores below 3.37 should be as- 
signed to HA; subjects with scores above 
17.63 should be assigned to the C treatment. 
None of the subjects in these groups had 
scores beyond the critical values. 

At retention testing, subjects in Groups 
LA and HA (8 = .57 vs. B = —.15) with 
Advanced Vocabulary scores below 8.50 
should be assigned to HA; subjects with 
scores above that value may be assigned to 
either treatment. Fifty-seven percent of the 
subjects in Groups LA and HA should be 
assigned to the HA treatment. 

The Advanced Vocabulary Test is related 
to measures of general mental ability that, 
in most studies on instruction, yield con- 
sistent positive correlations with outcome 
measures. The unique finding in this study 
is a slightly negative correlation between 
scores on the Advanced Vocabulary Test 
and the outcome measure for subjects in the 
HA treatment. In conjunction with positive 
correlations for Groups LA and C, disordi- 
nal interactions were obtained. 

5 The insertion of HA questions appears to 
aid subjects with low Advanced Vocabulary 
scores. Questions of this type and in this 
position may act in a compensatory manner 
(Salomon, 1972) when the ability to inter- 
relate concepts and ideas in the prose mate- 
rial is deficient. The HA questions prompt 
subjects to link together concepts in the 
prose material and/or to link the concepts 
in the prose material to the subjects own 
existing cognitive structure. The effective- 
ness of the HA question may also lie in its 
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ability to promote review in subjects who 
ordinarily do not engage in such activities. 

The negative slope also indicates that the 
HA treatment interferes with prose learn- 
ing for subjects high in verbal comprehen- 
sion. These subjects may possess effective 
strategies for processing prose material that 
are disrupted by insertion of external 
prompts. 

If upon replication this finding is con- 
firmed, there are immediate actions that can 
be taken to tailor instruction to the needs of 
particular learners. Verbal ability measures 
exist or are readily obtained in most instruc- 
tional settings. Instructional material for 
those who score low on such measures can 
be modified to include the insertion of higher 
order questions. If this technique can com- 
pensate for certain deficiencies in study 
skills, a large number of students can be 
helped in learning from prose material. 
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NOTE ON CRANO, KENNY, AND CAMPBELL'S 
«DOES INTELLIGENCE CAUSE ACHIEVEMENT?" 


JEAN L. DYER’ ax» LOUISE B. MILLER 
University of Louisville 


The cross-lagged panel correlation technique was applied to achieve- 
ment and intelligence data on young inner-city children. Findings 
indicated the predominant causal sequence was that of achievement 
causing intelligence as suggested by the Crano, Kenny, and Campbell 


data. 


Recently, Crano, Kenny, and Campbell 
(1972) used the cross-lagged panel correla- 
^tion technique to investigate the direction 
of the causal relationship between intelli- 
gence and achievement. Such an analysis 
requires correlational information on two 
variables at more than one point in time. 
Unlagged synchronous correlations and 
lagged autocorrelations are checked for sig- 


nificance and 


reliability, while the correla- 


tions that are crossed and lagged over time 
provide information regarding the direction 
of the causal relationship. 


The Orano 


et al. data at the fourth- and 


sixth-grade Jevels on the Lorge-Thorndike 
intelligence test and the Iowa Test of Basic 
Skills generally supported the inference 
“that the predominant causal sequence is 
that of intelligence causing later achieve- 
ment [p. 266],” as indicated by a signifi- 
cantly larger correlation between early 
intelligence and later achievement than be- 
tween early achievement and later intelli- 
gence. However, comparison of inner-city 


and suburban 


school data indicated that the 


relationship was in the opposite direction for 
the inner-city schools, although this differ- 


ence was not 


significant. The authors sug- 


gested that an abstract-to-concrete causal 
sequence predominated for the suburban 
children, while the opposite occurred for the 
inner-city children. They also suggested that 
similar studies conducted with younger chil- 


1 Requests for reprints should be sent to Jean L. 


ville, Louisville, 


< Dyer, School of Education, University of Louis- 


Kentucky 40208. 
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dren might provide a clearer pieture of the 
achievement-intelligence relationship. 

Intelligence and achievement data neces- 
sary for a cross-lagged panel analysis were 
available in a longitudinal comparison of 
the effects of four Head Start programs 
(Miller & Dyer, 1970). Such a group pro- 
vided a further test of the Crano et al. hy- 
pothesis about inner-city children. 


METHOD 


Sample 


A total of 213 children participated in the ex- 
perimental Head Start programs during the Head 
Start year and were followed up in both the 
kindergarten and first-grade years. Achievement 
and intelligence tests were given in the spring of 
each year. Complete sets of achievement and in- 
telligence data over the three years were available 
on 173 children. These children were all in the 
areas of the city eligible for Title I funding. 


Tests 


The Stanford-Binet intelligence test, Revision 
L-M, was individually administered to the children 
in the spring of each year. Testers were graduate 
students in psychology who had had testing prac- 
ticums and previous experience in administering 
the Stanford-Binet to inner-city children. Internal 
consistency reliability coefficients for the Stanford- 
Binet, based upon coefficient a for stratified-parallel 
tests (Rajaratnam, Cronbach, & Gleser, 1965; Sil- 
verstein, 1969) were 90, 83, and .98, respectively, 
for the Head Start, kindergarten, and first-grade 
years. 

" "The Preschool Inventory (Caldwell, 1968), a test 
of specific achievements representative of what 
the child brings to an educational situation rather 
than broad cognitive functioning, was administere! 

in the spring of the Head Start and kindergarten 
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Fic. 1. Interrelationships between intelligence 
(IQ) and achievement (Ach) from Head Start 
(HS) through first grade at one-year intervals. 
(K — kindergarten; 1st — first grade.) 


years, It focuses upon the vocabulary of the young 
child, his knowledge of basic numerical and sensory 
concepts, and his knowledge of his own personal 
world. For the Preschool Inventory, the Kuder- 
Richardson Formula 20 reliability coefficients were 
89 and .85, respectively, for the Head Start and 
kindergarten years, First-grade achievement was 
measured by the California Achievement Test 
Scores, which were obtained from publie school 
records. Raw scores were available on both the 
reading and mathematics subtests. However, no in- 
ternal consistency reliability measures were avail- 
able. The Preschool Inventory was individually 
administered by testers trained by the project 
staff, while the California Achievement Test was 
administered in a group setting by the public 
school staff, 


RrsuLTS 


Since the Stanford-Binet and Preschool 
Inventory correlations with the California 
reading and mathematics subtest scores 
were identical, separate analyses on reading 
and mathematics were not made. The cross- 
lagged panel results of the correlations be- 
tween intelligence and achievement for the 
two one-year intervals are shown in Figure 
1. For a sample size of 173, all córrelations 
greater than .12 were significant at the .05 
level. The test-retest correlations (lagged, 
autocorrelations) were significant. The syn- 
chronous, unlagged correlations between 
achievement and intelligence were also sig- 
nificant. 

It might be noted that relatively low test- 
retest correlations occurred between Head 
Start and kindergarten intelligence and be- 
tween kindergarten and first-grade achieve- 
ment. The latter may be due to the shift to 
a different achievement test at first grade. 
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However, both Head Start and kindergarten 
intelligence were measured by the same test, 
It was hypothesized that the low correlation 
between intelligence measures at these two 
points might be attributed to the immediate 
influence of the experimental Head Start 
programs on the rank order of the students, 
In order to investigate this hypothesis, test- 
retest correlations between the beginning 
and end of Head Start measures for the ex- 
perimental group referred to were compared 
with those for an available control group 
(n = 34) who had not been exposed to any 
Head Start program. On both achievement 
and intelligence, these test-retest correla- 
tions were lower for the experimental group 
(IQ r = .70; achievement r = .79) than for 
the control group (IQ r = .83; achievement 
r = 94). 

Since reliable measurement of both 
achievement and intelligence was indicated 
by high internal consistency coefficients and 
significant test-retest correlations, compari- 
son of the cross-lagged correlations was 
made. A significant difference occurred at 
both the Head Start — kindergarten interval 
(t = 3.32, df = 170, p < .01, two-tailed) 
and the kindergarten — first grade interval 
(t = 3.24, df = 170, p < .01, two-tailed)? 
The correlation between early achievement 
and later intelligence was significantly 
greater than that between early intelligence 
and later achievement. Comparison of the 
cross-lagged correlations over the two-year 
interval from Head Start to first grade (r — 
-64, between Head Start achievement and 
first-grade IQ, and 7 — .38, between Head 
Start IQ and first-grade achievement) pro- 
duced a similar result (t = 8.23, df = 170, 
p < Ol, two-tailed). Over this two-year 
period, the test-retest correlation for intel- 
ligence was .56 and for achievement was 41. 

As mentioned previously, Crano et al. 
(1972) hypothesized from their results that 
for inner-city children the predominant se- 
quence is that of achievement causing in- 
telligence. The present findings provide 
stronger support for their hypothesis. 


" The t test was the same as that used by Crano 
et al. (1972; see Peters & Van Voorhis, 1940, P- 
185). 
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INITIAL LEVEL OF STUDENT EVALUATION OF INSTRUCTIO 
AS A SOURCE OF INFLUENCE ON INSTRUCTOR 
CHANGE AFTER FEEDBACK! 


HAGOP S. PAMBOOKIAN? 
University of Michigan 


In early October 1971, 252 students in 18 introductory and educa- 
tional psychology sections responded to a Student Opinion Question- 


naire containing measures of sev 


en stable dimensions on college 


teaching. Ten days later, the instructors, who had been grouped ac- 
cording to the initial level of student evaluation, received feedback. 
In December, 231 students responded again to the Student Opinion 


Questionnaire. The students' 


initial evaluation of instruction was 


found to be a significant source of influence on instructor change after 
feedback. Instructors who were evaluated moderately well improved 
their teaching significantly on skill, interaction, and rapport, when 
compared to instructors rated more favorably. They showed trends 
toward decreasing work overload and improving rapport when 
compared to instructors rated more unfavorably, 


Instructors can improve their teaching 
through various approaches, and one way 
of helping them bring positive changes in 
their classroom behavior is to tell them what 
their students think of their teaching. 

Not much has been done with feedback in 
college classrooms. Most of the research on 
feedback in teaching has been carried out 
with clementary and secondary school 
teachers (e.g., Gage, Runkel, & Chatterjee, 
1963; Salomon & McDonald, 1970; Shively, 
Van Mondfrans, & Reed, 1970; Tuckman & 
Oliver, 1968). Several investigators showed 
that when student evaluations of teachers 
were fed back to teachers, they improved 
their teaching in the sixth grade (Gage et al., 


* This article is based on portions of a disserta- 
tion submitted in partial fulfillment of the re- 
quirements for the doctoral degree at the Univer- 
sity of Michigan. 
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S. Pambookian, 541 S. Ashley Street, Ann Arbor, 
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1963), in Grades 7-12 (Bryan, 1963), and 
in high school or a technical institute (Lau- 
roesch, Pereira, & Ryan, 1969; Tuckman & 
Oliver, 1968). Daw and Gage (1967), like- 
wise, found that feedback from elementary 
school teachers improved their principals 
behavior; while Tuckman and Oliver (1968) 
showed that the source of feedback had dif- 
ferent effects on teachers, that is, student 
feedback improved teachers’ behavior but 
feedback from supervisors alone had an ad- 
verse effect on them. 

Other studies using feedback, however, 
reported no significant changes in vocational 
agricultural teachers (Thomas, 1969), in 
graduate teaching assistants in religion and 
earth science (Miller, 1971), and in teaching 
fellows teaching introductory and educa- 
tional psychology courses (Pambookian, 
1972). 

The present study was concerned with col- 
lege students’ evaluations of instruction and 
change by the instructor after learning of 
such ratings. It was believed that the stu- 
dents’ initial evaluations of instruction sig- 
nificantly influence the extent to which in- 
structors change their teaching as a result 
of feedback. The degree of students’ satis- 
faction and dissatisfaction with teaching, 


STUDENT EVALUATION OF INSTRUCTION 53 


therefore, was thought to be an important 
factor in the use of feedback to improve 
teaching. It was hypothesized that instruc- 
tors evaluated moderately well by their stu- 
dents prior to feedback change more after 
feedback than instructors evaluated more 
favorably or more unfavorably. 

It is likely that instructors who are given 
favorable evaluations of their teaching by 
their students are satisfied with their be- 
havior in the classroom. Hence, they may 
see no need for change and/or improvement. 
These instructors have been positively re- 
inforced in their teaching and are likely to 
continue doing what was perceived to be 
skillful and effective. The instructors who 
have been evaluated more unfavorably by 
their students, on the other hand, may be 
made anxious by the ratings. A high-anxiety 
state is likely to lead to rigidity in behavior 
that may have disruptive and interfering 
effects on teaching, The instructors who 
have been evaluated moderately well, how- 
ever, are likely to be more responsive to 
information (ie., feedback) from students 
and more eager to improve their teaching. 
Unlike the others, these instructors are 
neither elated by more favorable evalua- 
tions nor disillusioned by more unfavorable 
ratings by students and become more at- 
tentive to their perceptions and needs, thus 
bringing about positive changes in their be- 
havior and teaching. For them, an improve- 
ment in teaching behavior is a feasible but 
challenging goal. 


METHOD 
Subjects 


Thirteen teaching fellows. (hereafter referred to 
as instructors) teaching sections of psychology at 
the University of Michigan participated in this 
study. All of them were advanced doctoral students 
in various psychology programs. Eight of the in- 
structors were teaching a multisection introductory 
psychology course, while the remaining five were 
teaching an introductory course in educational 
psychology. In both cases, the instructors had free- 
dom in planning their sections the way they and/or 
their students wanted, emphasizing certain aspects 
of the course and deemphasizing others based on 
their background and previous teaching experi- 
ences. There was flexibility as far as course ob- 
jectives, content, procedures, assignments, and eval- 
uations were concerned. 


Measurement of Instructor Behavior 


Behavior of the instructors was measured by the 
revised McKeachie-Lin Student Opinion Question- 
naire originally developed by Isaacson, McKeachie, 
and Milholland (1963).* Using the 46-item ques- 
tionnaire in introductory psychology courses at the 
University of Michigan, Isaacson, McKeachie, Mil- 
holland, Lin, Hofeller, Baerwaldt, and Zinn (1964) 
isolated six stable dimensions (or factors) of teach- 
ing on the college level as evaluated by students. 
They commented, “The factor similarity analysis 
suggests that six factors can be regarded as evi- 
dencing stability over sexes, evaluation periods, 
student groups, and teacher groups [p. 348].” How- 
ever, based on the latest findings in Michigan 
studies, a seventh dimension was added: achieve- 
ment standard. 

The experimenter selected 21 items from the 
questionnaire.“ Three items with high loadings for 
each of the seven dimensions were used in the 
study. (In addition, two items were included that 
assessed the instructor's all-around teaching ability 
and the overall value of the course.) The dimen- 
sions were (a) skill (instructor’s ability in observ- 
ing student reactions, explaining clearly, and stimu- 
lating the intellectual curiosity of the students) ; 
(b) overload (amount and degree of difficulty of 
work assigned by the instructor); (c) structure 
(the concern for keeping a tight schedule, follow- 
ing an outline closely, and deciding in detail what 
should be done and how); (d) feedback (keeping 
students well informed of their progress, telling 
them when they have done a particularly good 
job, or criticizing poor work) ; (e) interaction (stu- 
dents frequently volunteering their own opinions, 
arguing with one another or with the teacher, and 
expressing opinions and disagreements); (f) rap- 
port (instructor's being friendly, permissive, flexi- 
ble, and listening to what class members have to 
say); and (g) achievement standard (instructor's 
emphasis on grades, high-quality work, and main- 
tenance of definite standards of student perform- 
ance). 


Feedback Procedure 


The following three steps describe the sequence 
of data collection: 

1. All the students took the Student Opinion 
Questionnaire during the fourth week of the fall 
1971 term. Each instructor simultaneously re- 
sponded to the questionnaire as to how he per- 
ceived his own teaching with regard to the items 
rated. In no instance were the students identified 


*The latest available version of the question- 
naire was given to the investigator by Yi-Guang 
Lin, & research psychologist at the University of 
Michigan, for use in this study. 

+The items on the Student Opinion Question- 
naire were rated on a 5-point scale. The points on 
the scale were: almost always, often, occasionally, 
seldom, and never. 
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by name or identification number. All of the par- 
ticipants were encouraged to make comments per- 
tinent to the items and/or the questionnaire. 
Machine-scorable IBM answer sheets were used for 
the answers. (At this time, 252 students responded 
to the Student Opinion Questionnaire.) 

2. A week or 10 days after the administration of 
the Student Opinion Questionnaire, the researcher 
spent from 30 to 45 minutes with each of the in- 
structors, and the results of student evaluations 
were presented to him. The instructors were given 
feedback for (a) the Student Opinion Question- 
naire by each item and teaching dimension, using 
the class means, as well as the frequencies and 
percentages for each option on the scale; (b) the 
Student Opinion Questionnaire ratings by student 
grade point average and sex; and (c) their own 
self-evaluations and the discrepancy scores between 
student evaluations of them and their own percep- 
tions of their teaching. 

It was made clear to the instructors that feed- 
back was a means of providing them with infor- 
mation from their students relevant to their teach- 
ing and classroom behavior. They were permitted 
to ask questions regarding the results, but discus- 
sion concerning preferred changes in teaching style 
or ways of making these changes was not allowed. 
During the feedback sessions with the instructors, 
there seemed to be a genuine interest in the stu- 
dents' opinions. Their interest after seeing the re- 
sults was expressed by “That’s interesting"; “It’s 
good to know that”; “Beautiful”; “I am surprised, 
really”; “I didn’t think they were so understand- 
ing”; “I never thought they would respond that 
way”; “I must do something now”; “They must 
like me"; “I feel much better now!” 

3. In early December, eight weeks after the 
feedback sessions, the Student Opinion Question- 
naire was again administered to the students and 
instructors. The instructors also responded to a few 
questions related to student evaluations and the 
impact of feedback on them and their teaching. 
At that time, instructors, as well as students, were 
given an item entitled “Feedback to Participating 
Students,” which extended appreciation for their 
cooperation. (Two hundred and thirty-one students 
responded to the Student Opinion Questionnaire.) 


Measurement of Change in Instructor 
Behavior and Teaching 


, In order to have similar comparisons among the 
instructor groups on teaching dimensions, the rat- 
ings on each of the Student Opinion Questionnaire 
items were averaged across students on pre- and 
postfeedback means, and then the three item 
means for each dimension were added, 

Instructors were grouped according to the initial 
ratings on evaluative items of skill and rapport 
(i.e, Dimensions a and Í, described previously), as 
research (eg. Isaacson et al., 1964; MeKeachie. 
Lin, & Mann, 1971) has shown these dimensions 
to be clearly indicative of effective teaching. The 
three groups were (a) more favorably rated in- 


structors (n = 7); (b) more moderately rated 
(n = 8); and (c) more unfavorably rated (n = 3). 
The groups thus formed were compared, however, 
on all seven teaching dimensions. Analysis of vari- 
ance and ¢ tests were employed to test whether 
the initial level of students' evaluations of instruc- 
tion made any significant impact on instructor 
change and improvement after feedback. 


Resutts AND Discussion 


Analyses of variance comparing changes 
in teaching behavior of instructors rated 
favorably, moderately, and unfavorably 
prior to feedback on each of the seven 
teaching dimensions, gave some evidence of 
differences among the groups on four di- 
mensions. The differences were significant 
among the groups on rapport (F = 4.11, 
df = 2/10, p < .05), and there were strong 
trends toward differences on skill (F = 3.23, 
df = 2/10, p < .08), overload (F = 3.58, 
df = 2/10, p < .07), and interaction (F = 
3.24, df = 2/10, p < .08).5 

Individual t tests were conducted to com- 
pare the difference or gain scores between 
the instructor groups. The resulting t values ` 
indicated that the differences of gain scores 
between the favorably rated instructors and 
the moderately rated ones were significant 
in the teaching dimensions of skill (t = 
248, df = 10, p < .03), interaction (t = 
2.52, df = 10, p < .03), and rapport (t = 
2.86, df = 10, p < .02). 

It was evident from the postfeedback 
means of instructors in both groups that the 
instructors who had been evaluated moder- 
ately well prior to feedback changed (ie., 
improved) their teaching significantly after 
the feedback in at least three dimensions 
compared to those who had been rated more 
favorably. After the feedback, the moder- 
ately rated instructors explained subject 
matter more clearly and their explanations 
were more often to the point; they became 
more skillful in observing student reactions; 
they more often stimulated the intellectual 
curiosity of their students; and they in- 
creased their interaction and rapport with 
the students. Based on the postfeedback 
Scores, it can be concluded that the moder- 


"In view of the small number of cases involved, + 
yses of variance and t-test tables are elimi- 
nated from inclusion in the article. 
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ately evaluated instructors brought more 
positive changes, and hence improvement, to 
their teaching than the initially favorably 
evaluated instructors. The latter tended to 
regress toward. the mean. Figure 1 presents 
graphically the gains made by all the in- 
structors on skill, interaction, and rapport. 
The results also indicated that compared 
to the mort unfavorably rated instructors, 
trends toward change were found in the 
moderately rated instructors in rapport (t = 
1.88, df = 10, p < .09) and toward less work 
overload (t = 1.90, df = 10, p < .09) in 
teaching. In both cases, instructors evalu- 
ated moderately well seemed to benefit more 
from feedback than those more unfavorably 
rated. 5 
It should be interesting to note also that 
comparing the moderately and favorably 
rated instructors on all-around teaching 
ability and the overall value of the course, 
the moderately rated instructors changed 
more and the changes were significantly 
positive on the overall value of the course 
(t = 2.82, df = 10, p < .02) ; they showed a 
strong trend toward significant difference 
“with respect to all-around teaching ability 
. (t = 2.17, df = 10, p < .06). It appears 
that feedback helped the moderately rated 
instructors to ,enhance the value of the 
course and to improve their general teach- 
ing ability. - E 


*." CONCLUSION AND IMPLICATIONS ^ 


The results give substantial support to 
confirm tlie hypothesis that moderately 
evaluated instructors change more after 
feedback, because they changed their teach- 
ing in several dimensions. It appears that 
feedback had a more positive effect on the 
teaching of instructors who were moderately 
~ evaluated prior to feedback than on other 

teachers. 

. As predicted, the initial level of student 

evaluation of instructor and instruction was 

found to be a'strong source of influence on 
the instructor. The moderately evaluated 
` instructors changed significantly in skill, in- 
teraction, and rapport compared to instruc- 
tors who had been evaluated more favor- 
ably. The contrast in changes was significant 
at .05 level and in the expected direction. 


Favorably Rated Instructors 
Moderately Roted Instructors 
[Hi Unfovorably Rated Instructors 
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Fic. 1. Mean gain scores on each dimension for 
instructors evaluated more favorably, moderately 
well, and more unfavorably prior to feedback. 


That is, after the feedback, the moderately 
evaluated instruetors became more skillful 
in their teaching, had more classroom inter- 
actions and rapport as perceived by their 
students. (The differences between the mod- 
erately rated and the unfavorably rated in- 
structors were not statistically significant.) 

The implications for instruction are ob- 
vious: (a) In giving feedback to teachers 
in training and in service, the initial level 
of performance based on student evaluations 
should be taken into consideration; (b) 
teachers should be assisted in the develop- 
ment of skills to make better use of infor- 
mation on their teaching, which may not be 
very positive or favorable; and (c) other 
means should be explored for helping in- 
structors whose teaching is poor or ineffec- 
tive. Feedback as a means of bringing about 
positive change in, and improving, college 
instruction must be further investigated, 
however, with a larger number of faculty 
members in various departments to appraise 
its definitive role in teacher preparation and 
improvement. 
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SEQUENCE EFFECTS AND READING TIME IN 
PROGRAMMED LEARNING? 
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Undergraduates read a programmed text containing one experimental 
and three placebo sections. Experimental frames were either massed 
or distributed over placebo sections in logical or random sequence. 
There were no effects for practice distribution. Logical sequences were 
yecalled better than random sequences on the posttest. These data 
were replicated by groups receiving only the experimental section in 
either a logical or random presentation. There were no differences 
due to program length or presence of the placebo sections. Both this 
study and previous research were interpreted in terms of attenuated 
inspection behaviors due to both the distribution and disordering of 


frames. 


The appropriate sequencing of text ma- 
terial has long been a canon of sound in- 
structional design (cf. Gagné, 1965). In- 
terestingly, studies comparing criterion 
achievement from logically sequenced teach- 
ing programs in which frames are randomly 
ordered yield inconsistent results (Nieder- 
meyer, 1968). In spite of the careful atten- 
tion paid to correct sequence, this research 
often leads to a no-difference verdict (e.g., 
Cartwright, 1971; Niedermeyer, Brown, & 
Sulzen, 1969). In fact, so few studies have 
shown a clear superiority for sequencing 
(Roe, 1962; Tobias, 1972) that the principle 
itself appears doubtful. 

Commenting on the sequence research, 
Anderson (1967, 1970) suggested that posi- 
tive effects of logical structure may be offset 
because practice on similar material is 
massed in most sequenced programs. Alter- 
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nately, randomizing frames serves to space 
exposures to similar material, taking ad- 
vantage of distributed practice effects. What 
happens is that disordered programs allow a 
larger time interval to occur between contact 
with frames having near identical content. 
Often, adjacent frames require very similar 
responses; consequently, it may be that 
distributing encounters with interrelated 
material leads to performance facilitation in 
programs in much the same manner as it 
does with larger units of text (e.g., Reynolds 
& Glaser, 1964). In many logically se- 
quenced programs, however, practice den- 
sity is high for like materials, and one might 
expect similar responses to interfere with 
one another at criterion recall. The argu- 
ment then, is that any facilitation due to 
sequence may be counterbalanced by a 
massed practice decrement. Under these 
conditions, we would be correct in predict- 
ing a no-difference result when comparing 
sequenced and unsequenced programs. The 
authors can find no research that has tested 
this possibility. 

Most studies manipulating sequence have 
varied only the order of frame presentation. 
Should distributed practice effects operate 
as we suppose, it would be difficult to inves- 
tigate them within a single-program design, 
since all the subject matter is conceptually 
related. We reasoned that the separate in- 
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fluence of distribution and sequence could 
be isolated by simply varying the location 
and order of the experimental frames 
throughout a body of nonrelated placebo 
text. Using this procedure, we could test 
differences due to both separation of frames 
containing like material and their order of 
presentation. Also, groups receiving only the 
experimental frames, in both logical and 
random order, would allow a direct test of 
effects due to program length and presence 
or absence of placebo material. Here, the 
distributed practice hypothesis predicts an 
interaction between distribution and order, 
with little difference between massed presen- 
tations but significant facilitation when the 
critical frames are distributed throughout 
the placebo text. 

It is possible that massed presentation de- 
creases criterion performance because of 
competition among contiguous responses. If 
recall is interfered with in this fashion, one 
would expect massed subjects to select a 
greater number of incorrect, similar re- 
Sponses on a multiple-choice test. On a con- 
structed response measure, scores should be 
differentially higher for massed subjects, be- 
cause interfering responses are not available 
for competitive recognition. We could test 
this assumption by including distractor 
choices that are selected from material ad- 
jacent to the correct response in the pro- 
gram for each multiple-choice item. Our 
prediction is that adjacent distractors will 
be chosen more often by the massed presen- 
tation learners. In the present study, the 
test mode variable was included to assess 
this hypothesis, 

Finally, not only interference but the 
amount of attention the subject pays to 
the frames may be influenced by spaced 
encounters. A number of the nonsignificant 
studies have shown longer times for the dis- 
ordered frame group. The argument for the 
distribution hypothesis would be buttressed 
if we could show that frame reading time 
is greatest for the distributed groups. It 
seems possible that the amount of time the 
subject spends reading may be inversely re- 
lated to his exposure to like material prior 
to reading a particular frame. 


METHOD 


General Design 


Two variables, practice distribution and frame 
order, were combined factorially to yield four ex- 
perimental groups. A third variable, test mode, was 
varied within subjects across each factorial cell. 
The design was thus a 2 (Distributed vs. Massed 
Practice) X 2 (Logical vs. Random Order) x 2 
(Multiple-Choice vs. Constructed Response Test) 
mixed analysis of variance with repeated measures 
on the test mode variable. 

Three control groups completed the experimen- 
tal design. The first of these (Control 1) studied 
only the experimental frames in normal order 
without seeing the placebo material. The second 
control (Control 2) was identical to the first, ex- 
cept that the frames were seen in a random order. 
The third group (Control 3) read only the placebo 
material. 


Procedure 


Subjects participated in groups of 8-58, with 
individuals from each condition in every session. 
Prior to handing out the materials, subjects were 
asked to define “myocardial infarction.” Those 
learners who could do so successfully were dropped 
from the data analysis. Following this pretest, 
packets of material from the seven conditions 
were handed out randomly to participants. Control 
group packets were distributed at a slightly lower 
ratio than were the experimental packets. Each 
packet contained the appropriate program booklet 
prefaced by a sheet of instructions. Subjects were 
asked to read each program frame, respond to the 
questions, and record their time for completing 
each frame from a visible time board. Subjects 
were cautioned to read steadily through the pro- 
gram without returning to frames previously stud- 
ied. No mention was made of the varying program 
Sequences. Figures recorded from the time board 
were accurate to 10 seconds. When all subjects 
signified that they had read and understood the 
instructions, they were asked to begin. 1 

As soon as he completed the program, the sub- 
ject raised his hand and a monitor removed the 
text and gave him the constructed response form 
of the criterion test. The multiple-choice form was 
given to the subject as soon as he finished the con- 
Structed response version. No time limit was im- 
posed on completion of either test form. 


Subjects 


The subjects were 176 undergraduates from 
Arizona State University. Twelve subjects were 
dropped from the study for failure -to follow in- 
structions or for prefamiliarity with the experimen- 
tal subject matter. The final design contained 26 
subjects in each of the four experimental groups 
and 20 in each of the controls. 
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Materials 


The experimental program contained 48 frames, 
adapted from Tobias (1968), on the diagnosis of 
myocardial infarction from electrocardiograms. The 
experimental frames averaged 24 words in length 
and included line drawings as well as text. Each 
frame required the subject to respond either in 
writing or by constructing à line drawing. A valida- 
tion study prior to the main experiment assured 
that (a) the program sequence was perceived as 
logical by the reader and (b) each of the 32 test 
questions could be answered correctly from reading 
the applicable program frames. 

The placebo program consisted of three 20-frame 
sections on the respiratory system (Putnam, Light- 
foot, & McDaniel, 1970), the muscles of the trunk, 
and the knee joint (McDaniel, Kindig, & Putnam, 
1965). The 60 placebo frames averaged 27 words 
in length, contained the same number of illustra- 
tions as the infarction program, and were judged 
to have no content that overlapped with the ex- 
perimental frames. The response requirements for 
the placebo program were essentially the same as 
those for the experimental program. 

Within the four treatment groups, the experi- 
mental frames were either distributed evenly 
throughout the placebo material or massed to- 
gether in one location. For the order conditions, 
experimental (E) frames were either random pres- 
entations or the logical presentations given in the 
original program sequence. Location of the massed 
frame presentations within the placebo (P) sec- 
tions was counterbalanced (e.g., E-P;-P.-P;; Ps-E- 
PPs; P:-P:-E-Ps; P,-P:-P;-E) across each condi- 
tion. In Control Groups 1 and 2, the experimental 
frames were given alone in either logical or random 
order. Control Group 3 read only the placebo mate- 
rial. 

The two criterion tests consisted of 32 identical 
items. Questions in the constructed response ver- 
sion contained a blank for the subject to fill in; 
whereas the multiple-choice stems were followed 
by four response alternatives. About 15% of the 
criterion items required the subject to construct or 


identify an electrocardiogram drawing. In order to 
test the interference notion, each multiple-choice 
item contained a distractor selected from frames 
immediately adjacent to the one containing the 
correct answer in the original program. The re- 
maining distractors were chosen from more distant 
positions. 
RESULTS 

Table 1 presents criterion score means and 
standard deviations for both the experi- 
mental and control conditions. A 2 (Practice 
Distribution) x 2 (Frame Order) x 2 (Test 
Mode) analysis of variance on these data 
yielded significant main effects for order 
(F = 3.99, df = 1/100, p < .05) and test 
mode (F = 34.41, df = 1/100, p < 01). 
None of the remaining terms in this analysis 
reached significance. 

To test for specific placebo material ef- 
fects, a second unweighted-means analysis 
was performed on the combined experi- 
mental and control groups (Controls 1 and 
2). This 2 (Placebo X No Placebo) x 2 
(Frame Order) x 2 (Test Mode) analysis 
showed significant results only for the 
Frame Order main effect (F = 5.66, hikes 
1/88, p < .05). Apparently, whether or not 
the subject received the placebo frames had 
little effect other than to attenuate test 
mode differences. Differences between the 
ordered and randomized controls essentially 
replicated those of the experimental groups 
(t = 2.57, df = 38, p < 05). A 2 (Practice 
Distribution) x 2 (Frame Order) analysis 
of variance on the number of adjacent dis- 
tractors chosen on the multiple-choice test 
yielded no significant results. 


TABLE 1 
MEAN CRITERION SCORES BY Srquencine AND Test Mone COLLAPSED ACROSS DISTRIBUTION 
Control group 
Experimental group 
Sequencing of frame Experimental material Placebo 
material 
Multiple choice | Constructed | Multiple choice Copstructed Mop Cel 
Logical 
M 19.53 17.08 20.81 18.75 1.25 60 
SD 6.64 6.15 7.20 6.41 3.10 86 
Random 
M 16.83 15.06 15.81 13.90 
SD 5.94 5.84 7.39 6.26 
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TABLE 2 
Mean EXPERIMENTAL Program COMPLETION 
Time iN MINUTES FOR SEQUENCE AND 
DISTRIBUTION CONDITIONS 


Material distribution 
Sequencing of frames |———— Control 
Massed | Distributed 
Logical 
M 23.33 | 22.20 26.77 
SD 6.88 5.75 6.68 
Random 
M 23.61 | 17.39 25.99 
SD 7.42 5.04 8.13 


Table 2 shows the means and standard 
deviations for time needed to complete only 
the experimental frames. A 2 (Practice Dis- 
tribution) x 2 (Frame Order) analysis of 
variance for the experimental condition only 
gave signifieant results for the practice dis- 
tribution main effect (F — 8.42, df — 1/100, 
p < .01) and the Practice Distribution x 
Frame Order interaction (F = 4.02, df = 
1/100, p < .05). Combining both the ex- 
perimental and controls (Controls 1 and 2) 
into a 2 (Frame Order x 2 (Placebo x No 
Placebo) unweighted-means analysis of 
variance yielded no significant effects. 

A 2 (Practice Distribution) x 2 (Frame 
Order) analysis on errors made during the 
program yielded only one significant effect 
for the frame order variable (F = 32.13, df 
= 1/100, p < .01). Mean errors were 45.86 
for ordered and 67.34 for random learners. 


Discussion 


Clearly these results offer small support 
for the distribution hypothesis, Whether or 
not the experimental frames were massed 
together or distributed across the placebo 
program had little effect on amount of re- 
call. Interestingly, criterion performance 
was significantly suppressed by the disorder- 
ing procedure, regardless of whether or not 
the subject also read the placebo material, 
These results are, of course, exactly what 
one would predict if sequencing were of 
major instructional importance. The fact 
that interfering multiple-choice alternatives 
were selected no more often than other al- 
ternatives provides no power for the re- 
sponse competition argument. 


An adequate explanation of these data 
seems to pivot on attending behavior. The 
amount of time spent reading experimental 
frames appears proportionately related to 
the criterion scores. In general, the massod 
presentation groups took longer to study the 
infarction material than did their distributed 
presentation counterparts. However, note 
the form of the Practice Distribution x 
Frame Order interaction. The differential be- 
tween the pooled groups is almost entirely 
due to the rapid study time of subjects re- 
ceiving the distributed random presentation. 
The criterion scores and error data follow 
this same trend, 

From these results we could reason that 
when material is disordered and separated, 
it is more likely to attenuate the subject's 
inspection behavior. He is less willing to 
maintain high levels of attention when sub- 
ject contiguity is disturbed along both 
these dimensions. Such disruption fails to 
occur when only disorder or distribution is 
manipulated. Our attention explanation gen- 
eralizes to results of previous studies. Many 
programs used in this research were highly 
redundant in structure. In a markedly re- 
petitive program, it would be difficult to 
distribute similar materials, simply be- 
cause much of the total sequence contains 
the same information. Here we could ex- 
pect minimal effects for disordering frames 
if the two-dimensional notion is valid. In a 
nonredundant program in which most of the 
material is critical for criterion performance, 
randomizing frames should have a measura- 
ble influence on recall, because randomizing 
in this case serves to distribute as well as 
disorder like material. 

Tobias (1972) points out that sequence 
effects are most powerful when subjects have 
low preinstruction familiarity with difficult 
program content. Our two-dimensional hy- 
pothesis would predict exactly this result. 
When learners study a subject with which 
they have little prior experience, all of the 
material becomes critical for criterion per- 
formance. Disordering of frames should lead 
to strong sequence effects under these condi- 
tions because both disruption and distribu- 
tion occur simultaneously, leading to the 
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same attenuation of inspection shown by our 
data. 

If our argument is valid, the conclusions 
regarding instructional design become ob- 
vious. Sequence should receive close atten- 
tion when preinstruction familiarity is low 
and the material is nonredundant. Such 
measures are obtainable for a given instruc- 
tion package by pretesting and by some 
measure of “critical conten ” such as the 
blackout ratio (Kemp & Holland, 1966). 
Under conditions in which either, but not 
both, of these measures is high, disruption of 
“normal” order should have minimal effects. 
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EFFECTS OF TEST SCORE FEEDBACK ON IMMEDIATELY 
SUBSEQUENT TEST PERFORMANCE 


BRENT BRIDGEMAN* 


University of Virginia 


This experiment investigated the effects of success feedback, failure 
* feedback, and no feedback on a scholastic aptitude test administered 
immediately after feedback was given. Sham scores, obstensibly from 
a similar test administered two days previously but actually randomly 
determined, were given to 233 male and female seventh-grade students 
immediately before taking a test that consisted of items from the 
nonverbal battery of the Lorge-Thorndike Intelligence Test. Subjects 
given success feedback (i.e., told that they had received a high score 
on the previous test) scored significantly higher (p < 02) than sub- 
jects given failure feedback. The group receiving no feedback was 
not significantly different from the average of the two feedback groups. 


Test scores are widely assumed to be in- 
fluenced by the motivational level and emo- 
tional state of the person taking the test. It 
is also clear that reporting test scores to 
students may affect their motivation, self- 
confidence, and anxiety level (Kirkland, 
1971). Yet, there is little evidence directly 
relating the effects of reporting success or 
failure on one test to performance on an 
immediately subsequent academie test. A 
number of studies (Flook & Saggar, 1968; 
Hurlock, 1925; Means & Means, 1971; Sax 
& Reade, 1964) suggest that feedback from 
testing situations influences later test per- 
formance. In these studies, however, there 
is a period of time, ranging from a few days 
to a number of months, between the time 
of the feedback and the time of the later 
testing. The results, then, are due not only 
to possible differences in the affective state 
of the subject while taking the test but are 
presumably also strongly influenced by dif- 
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ferential study habits in the interval be- 
tween the report of the feedback and the 
subsequent testing. It is also of great inter- 
est, however, to determine the effects of 
feedback on test performance when study 
habits are not a potential intervening varia- 
ble. For example, a test administrator would 
need to know whether giving feedback mid- 
way through a testing battery would influ- 
ence scores on later tests in the battery or 
whether allowing students to grade one sec- 
tion of a test before proceeding to another 
section could influence their performance on 
subsequent sections. 

The motivational effect of knowledge of 
results when there is no opportunity for 
study between the feedback and subsequent 
testing has been well documented (Locke, 
Cartledge, & Koeppel, 1968), but the tasks 
utilized in these studies are not comparable 
to the complex demands of an academic 
test situation. As Means and Means (1971) 
noted, most of the research in this area is 
limited to simple computational, physical, 
and verbal tasks that are not academic in 
nature. This distinction is important not only 
because of the different cognitive demands 
of these tasks but also because whether the 
subject perceives the same task as a research 
project or an educationally important test 
can affect his score (Katz & Greenbaum, 
1963). The few studies that appear to use ed- 
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ucational tests are not really relevant to the 
majority of academic tests, which are pri- 
marily power (not speed) oriented. Thus, for 
example, Osler. (1954) claimed to assess feed- 
back effects on a test of “intellectual per- 
formance.” A group of students given sham 
feedback stating that their scores on a pre- 
vious test in arithmetic put them in the 
lowest 10% of all students tested scored 
significantly lower on a subsequent test than 
another group that was told that they did 
very well. The experimental task, however, 
consisted of long-division problems admin- 
istered to eleventh-grade students under a 
strict time limit. There is a strong implica- 
tion that the experimental manipulation in- 
fluenced only speed of performance. Simi- 
larly, Anderson, White, and Wash (1966) 
demonstrated the advantageous effects of 
success feedback over failure feedback on a 
120-item multiple-choice test of addition 
and substraction of single-digit numbers 
with all subtraction differences being posi- 
tive; obviously this was a nearly pure speed 
test with their sample of female college stu- 
dents. 

The current experiment was designed to 
assess the immediate effects of success feed- 
back (a high score on a previous test), fail- 
ure feedback, or no feedback on a power 
test of academic aptitude administered to 
seventh-grade students under realistic con- 
ditions. Possible sex differences and differ- 
ential effects of feedback on subjects of dif- 
ferent ability levels were also investigated. 
Specifically, the following predictions were 
made: 

1. Subjects given success feedback score 
higher than subjects given failure feedback. 
This is the contrast of primary interest and 
is clearly directional. 

2. Subjects given no feedback differ in 
score from the average score of subjects 
given positive and negative feedback. This 
is a nondirectional test of feedback (regard- 
less of type) versus no feedback. 

3, Within males, the difference between 
the success and failure conditions is unequal 
in the high- and low-ability groups. This is 
a test of a specific feedback by ability inter- 
action. 

4, Within females, the difference between 


the success and failure conditions is unequal 


in the high- and low-ability groups. This is 
a test of a specific feedback by ability inter- 
action. i 

5. The difference between the success and 
failure conditions will be unequal in males 
and females. This is a test of a specific feed- 
back by sex interaction. 

6. There is an overall sex difference. 

7. Within males, high-ability subjects 
score higher than low-ability subjects. This 
is simply a check on the legitimacy of using 
the short pretest as a blocking variable. 

8. Within females, high-ability subjects 
score higher than low-ability subjects. This 
is simply a check on the legitimacy of using 
the short pretest as a blocking variable. 


METHOD 


Subjects 


The subjects were all of the seventh-grade stu- 
dents present on both testing days from two 
schools in small cities in central Wisconsin (N = 
233). It was decided in advance that approximately 
this number of students was necessary in order to 
be able to detect differences between the positive 
and negative feedback conditions (the contrast of 
primary interest) of one-half of a standard devia- 
tion with a power of .80 when alpha was set at .05. 


Materials 


All test items on both the pretest and the cri- 
terion test were taken from the Lorge-Thorndike 
Intelligence Tests, Multi-Level Edition, Form 1, 
nonverbal battery. This test was chosen because 
itis a widely used, well-constructed group intel- 
ligence test (Cronbach, 1960). The nonverbal bat- 
tery was selected because it would appear to the 
student to be independent of vocabulary and read- 
ing ability, and hence students who were weak in 
those areas might still believe they could get high 
scores. 

For the pretest, items were selected from the 
first subtest in the nonverbal battery. These items 
consist of three drawings with something in com- 
mon, and the student: must select from five choices 
which figure “goes with the first group.” 

Items for the criterion test were selected from 
Subtest 2 of the nonverbal battery and included 
all items from Levels D, E, and F (ie., Items 15- 
60), which is the difficulty level appropriate for 
seventh-grade students. In these items, the subject 
is asked to figure out the way in which a series of 


*Ttems used by permission of the publisher, 
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items is arranged and then to pick the next number 
from five choices. On both tests, the answers were 
recorded on machine-scorable answer sheets. 


Procedure 


The testing was done near the end of the school 
year when the students normally expect to be 
taking standardized tests. The students were noti- 
fied ahead of time that they were to be tested on 
two days during the week, although the nature of 
the testing was not revealed. On the day of the 
initial testing, the experimenter told the children 
to read the instructions written on the front page 
of the test as he read them out loud. The instruc- 
tions were as follows: 


This is a test of your ability to solve ab- 
stract problems. Since your school work often 
requires memorization rather than this type of 
thinking, your score on the test may not be 
directly related to how well you have done in 
school so far. 


Standard directions (slightly modified, since dif- 
ferent test booklets and answer sheets were used) 
for taking the test then followed. The time limit 
was 13 minutes, which is slightly more time than 
is required in the test manual. This was done to 
allow even the slow students time to at least at- 
tempt nearly all of the items, which was necessary 
if they were to believe that they had done very 
well on the test. The results of the test were used 
to divide the subjects of each sex into high-, me- 
dium-, and low-ability groups, with as nearly equal 
numbers as possible in each ability level (i.e., male 
and female groups were separately divided into 
thirds, based on the scores of the first test). 

Two days after the first test, the second test 
was administered. The students were cautioned not 
to talk after receiving their test papers, and to 
leave the test booklets unopened on their desks. 
The top sheet contained only the student's name. 
After all of the papers had been passed out, the 
students were instructed to open their test booklets 
to the second page, where they found their “score” 
on the previous test and an interpretation of it. 
Actually, the scores were randomly assigned (in as 
nearly equal numbers as possible) within each sex 
and ability group. Of all the children tested, 77 
received a score of 96; 77 received a score of 53; 
and 79 had a blank space after “score” on the 
interpretation sheet. On the interpretation sheet, 
90 and above corresponded to the comment “Ex- 
cellent! Your problem solving ability is among the 
best of all seventh-grade students.” The score 59 
and below corresponded to the comment “Poor. 
Your problem solving ability is among the worst 
of all seventh-grade students.” The no-feedback 
group was told, as the tests were being distributed, 
that some of the papers had not yet been graded. 
The experimenter than instructed the subjects to 
read the instructions on the next page of the test 
booklet as he read them aloud. The instructions 


were as follows: “This is another test of your 
problem-solving ability. Like the other test you 
took, it is not related to how well you have done 
in school so far.” The standard instructions 
(slightly modified) then followed. The time limit 
was 17 minutes, which is slightly more than rec- 
ommended in the test manual, in order to minimize 
the effects of speed and further assure that it was 
basically a power test. 

Design 

The experimental design was a randomized 
block factorial with ability nested in sex. Sex was 
a nested factor, since ability grouping was done 
separately for males and females, and hence the 
cutoff scores for the ability levels were not identical 
in each sex. 

Each of the eight planned comparisons was 
tested as a one degree of freedom contrast. The 
first hypothesis was of greatest interest and was 
tested with alpha set at .05. The other seven hy- 
potheses were each tested with alpha set at 01 in 
an effort to keep the experimental error rate small. 


RESULTS 


Scores consisted of the number of correct 
responses on the second test. Numbers, 
means, and standard deviations per cell for 
the test score data are reported in Table 1. 

For each of the eight planned compar- 
isons, the within cell mean square was 
20.53 (df = 1/215). The major hypothesis 
that subjects given success feedback will 
score higher than subjects given failure feed- 
back was supported (F = 5.38, p < .012, 
one-tailed). The no-feedback condition was 
neither significantly better nor worse than 
the average of the feedback conditions (F < 
1). For both males and females the specified 
feedback by ability interactions were not 
significant (F = 1.23 and F < 1, respec- 
tively). The specified sex by feedback inter- 
action was also not significant (F < 1). Fe- 
males, on the average, scored higher than 
males (F = 11.07, p < ..001). In both sexes, 
students selected as high scorers on the pre- 
test scored significantly higher on the post- 
test than students selected as low scorers 
(for males, F — 55. 24, p « .001, one-tailed, 
for females, F = 30.93, p « .001, one- 
tailed), indicating that the pretest scores 
served as a legitimate blocking variable. 


Discussion 


The results suggest that under realistic 
academic aptitude testing conditions, the 
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TABLE 1 
Nuwsrns, MEANS, AND STANDARD DEVIATIONS ON THE Criterion TEST 
Males Females 
Feedback condition $ Total 
High Middle Low High Middle Low 
ability ability ability ability ability ability 
Success 
n 13 13 13 12 12 14 7 
M 22.02 18.46 16.08 23.08 22.25 18.00 19.99 
SD a 4.29 3.26 4.97 3.85 3.17 5.33 
Failure 
n 13 13 12 11 14 14 77 
M 5 21.69 19.38 12.33 21.55 20.14 15.21 18.36 
SD 5.41 4.44 6.38 3.24 3.66 4,21 
None 
n 13 14 11 14 14 13 79 
M 21.85 16.00 14.36 23.00 20.14 17.23 18.91 
SD 4.24 5.10 5.99 3.06 3.68 5.56 


student's perception of his prior success or 
failure influences his subsequent perform- 
ance. The implications for the teacher or 
guidance counselor are clear; comparative 
feedback should be withheld until all tests 
in a battery have been administered, thus 
assuring that feedback from the first tests 
will not differentially affect scores on the 
later tests. 

While there has been considerable interest 
in educational expectancy effects since the 
widely reported Rosenthal and Jacobson 
(1968) study, attention has focused almost 
exclusively on teacher expectancy. Perhaps 
because of the rather indirect linkage be- 
tween the creation of expectancies in teach- 
ers and the ultimate test performance of 
their students, a number of attempts to rep- 
licate the general findings of Rosenthal and 
Jacobson have failed (Finn, 1972). The 
current experiment suggests that the more 
direct manipulation of self-expectancy, 
as through the report of the test score di- 
rectly to the student, deserves considerably 
more attention from researchers interested 
in educational expectancy effects. Similarly, 
“halo” effects, in which initial perceptions 
of success or failure influence scoring of 
subsequent test responses, should be viewed 
as affecting not only test administrators but 
test takers as well. Thus, the person who 
fails initially may not only be scored lower 
because of his failure, but may actually per- 


form more poorly than he could have be- 
cause of his perception of himself as a fail- 
ure. 

Since there was no significant difference 
between the no-feedback group and the 
average of the two feedback groups, any 
definitive statement is unwise. The apparent 
location of the no-feedback group between 
the success and failure feedback groups at 
least suggests that knowledge of results of 
the “motivational” (i.e., not corrective) type 
discussed by Locke, Cartledge, and Koeppel 
(1968) should not be automatically assumed 
to be facilitating, but rather careful atten- 
tion should also be given to whether the 
knowledge of results is likely to be perceived 
by the subject as indicative of success or 
failure. 

The failure to find significant sex by feed- 
back or ability by feedback interactions 
may be partially attributable to the availa- 
ble sample size, which was sufficient to pro- 
vide the desired power only for the main 
effects; hence the power for the interaction 
hypotheses was less than adequate. Never- 
theless, the data suggest some areas for fur- 
ther investigation. A possible ability by 
feedback interaction, especially in males, 
should be studied. For males in the high- 
ability group, the difference between success 
and failure feedback was only .93; while in 
low-ability males, it was 3.75, yielding a 
difference of 2.82. In females the same pat- 
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tern occurred, although the difference of the 
differences was only 1.26. It may be that 
bright students are more likely to try their 
best under any circumstances, while slower 
students, especially males, give up more 
easily unless they are working on tasks in 
which a report of past success makes them 
believe they have a chance to succeed. This 
speculation should be confirmed by further 
research, since it is only suggested, and not 
really supported, by the current investiga- 
tion. 
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EFFECTS OF MODELS OF CREATIVE PERFORMANCE ON 


ABILITY TO FORMULATE HYPOTHESES 
NORMAN FREDERIKSEN? Ax» FRANKLIN R. EVANS 
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The effects of training procedures, ideational fluency, verbal ability, 
test anxiety, and sex on Formulating Hypotheses test performance 
were studied. Training consisted of presentation of models of “accept- 
able” responses that stressed either quantity or quality of performance. 


Both the quantity and quality models were found to be effective in 
modifying behavior in the expected direction. Ideational fluency was 
related to the number of hypotheses written, and verbal ability was 
related to scores reflecting quality of responses. Females were in 
general superior to males with respect to scores reflecting number of 
responses. Test anxiety was not significantly associated with perform- 
ance. The effect of the training procedures was interpreted as changing 
standards with regard to what is “good enoug! ” to report rather than 


changing ability. 


A scientist who is trying to interpret the 
graphs or tables summarizing a research in- 
vestigation may begin by making sure that 
he understands the study—what the vari- 
ables mean, how the data were collected, the 
nature of any experimental treatments, ete. 
Then he may engage in speculations as to 
what may account for the results. He may 
begin by being skeptical of any apparently 
obvious interpretations and look for pos- 
sible confounding, hidden sources of error, 
poor research design, or inappropriate meth- 
ods of analysis. Or he may begin by indulg- 
ing in unrestrained speculation about new 
theoretical implications and insights that 
may be suggested by the data. Eventually 
he will begin to operate under the con- 
straint that whatever interpretation he 
seriously entertains should be consistent 
with all the data and with other informa- 
tion available to him. After engaging for 
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a while in the process of generating explana- 
tory concepts, on the one hand, and evaluat- 
ing them against such criteria as their 
theoretical value and logical consistency, 
on the other, the scientist may be left with a 
fairly large number of hypotheses in mind 
that vary considerably with regard to 
probability of being correct. If at this point 
the scientist is asked to suggest interpreta- 
tions of the finding, he may, depending upon 
his personal predilections, propose a large 
number of hypotheses, some of which are 
blue-sky speculation, or he may propose 
only a few ideas that meet rigorous sub- 
jective standards. 

Formulating Hypotheses (Frederiksen, 
1959) is a test that was developed in an 
attempt to measure abilities of the sort re- 
quired of a research scholar who is trying to 
make sense out of research findings. Each 
item of the test consists of a graph or a 
table showing findings from an actual re- 
search study, and the task is to propose 
hypotheses to account for the findings. The 
sample item is a graph showing yearly rates 
of death from infectious diseases and from 
diseases of old age. The finding, printed 
below the graph, is as follows: “Rate of 
death from infectious diseases has decreased 
markedly since 1900, while rate of death 
from diseases of old age has increased.” The 
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subject is provided with an answer sheet and 
instructed to write "short statements of 
hypotheses (possible explanations) which 
you think might account for, or help to ac- 
count for, the finding.” Sample answers are 
provided, including the following: “As treat- 
ment of infectious diseases has become more 
successful, more people survive to old age 
and, since everyone must die, death from 
diseases of old age increases”; “Compulsory 
inoculation laws have produced a continuous 
decline in incidence of infectious diseases 
and hence a decline in mortality”; and 
“More restrictive definitions of infectious 
have reduced the number of diseases so de- 
fined, while more diseases have been defined 
as ‘old age’ diseases.” Items in the test itself 
present data showing, for example, that time 
lost from work stoppages was greatest in 
the summer months, that grades earned in 
college were on the average higher for 
younger than for older freshman students, 
and that during a certain postwar period 
employment decreased in mining and in- 
creased in all the other industries included 
in a survey. Thus the test attempts to 
simulate an aspect of the creative work of a 
scientist. 

Applying our armchair analysis of a sci- 
entist's thinking to Formulating Hypotheses 
items suggests that several cognitive abili- 
ties may be involved in taking the test. One 
kind of ability would presumably be idea- 
tional fluency, the ability to generate many 
ideas, including some unusual or original 
ideas. But because of the constraint that 
the solutions must be potentially useful in 
accounting for the finding, other cognitive 
abilities such as reasoning, memory, and 
verbal ability would be required. This im- 
plies that à proper balance must be main- 
tained between generating many ideas (in- 
cluding some that may be inconsequential) 
and discarding ideas (some of which may be 
valuable) that fail to meet some Standard 
of rigor or probability of veridicality, An 
attempt to account for individual differences 
in arriving at such a balance gets us into the 
domain of personality. Those who are will- 
ing to propose interesting though implau- 
sible hypotheses may be people with a high 
degree of self-confidence or a lack of con- 


cern about opinions of others. Those who 
discard all but the most obvious hypotheses 
may be anxious or defensive. Or another 
possible way to account for the contrast is 
in terms of the kind or level of standards of 
excellence one has learned to set for him- 
self—standards that determine for each in- 
dividual when he says, “That’s good 
enough.” This study represents an attempt 
to investigate the role of some of these fac- 
tors in the process of formulating hypoth- 
eses, using Formulating Hypotheses as a 
source of the dependent variables. 

There has been a good deal of controversy 
regarding the correlation between creativity 
and intelligence, the usual finding being that 
the correlation is positive but low. But the 
measures of “creativity” that are typically 
employed in these studies are scores on tests 
that are more properly labeled fluency tests, 
reflecting number of responses and number 
of unusual responses. Measures of quality 
have rarely been used, perhaps because the 
nature of the tasks involved does not elicit 
performance of sufficient breadth and 
variety to make possible reliable judgments 
of quality. Formulating Hypotheses items 
do elicit responses that clearly vary in 
quality as well as quantity, and one might 
expect scores based on quality judgments to 
correlate with the convergent thinking tasks 
involved in the typical intelligence test, 
while number of responses might correlate 
with the measures of fluency. One of the 
purposes of the present study was to de- 
velop quality as well as quantity scores for 
Formulating Hypotheses and to investi- 
gate their relationships to verbal ability as 
well as to tests of fluency. 

In an earlier study (Klein, Frederiksen, 
& Evans, 1969), it was shown that the num- 
ber of responses to Formulating Hypotheses 
could be increased merely by presenting to 
subjects models in the form of sets of re- 
sponses containing substantially more hy- 
potheses than subjects typically write. It was 
suggested that the effect of the models was 
probably not to change ability but to alter 
subjects’ standards with regard to what con- 
stitutes satisfactory performance. Such an 
interpretation is consistent with the findings 
of Levy (1968), who showed that present- 
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ing samples of test responses as a way of 
defining a role that subjects were asked to 
adopt was effective in increasing the number 
of “original” (uncommon) responses to a 
word association task. The finding is also 
consistent with a study by Stratton and 
Brown (1972); they used a "judgment- 
training" technique that involved presenting 
examples of good and bad responses (as 
well as practice with criteria for evaluation 
of responses) and found that the training 
substantially increased the quality of the 
responses (plot titles). Stratton and Brown 
appear to attribute the improvement to an 
increase in ability to evaluate, but it seems 
equally likely that the improvement can be 
attributed to altered standards. The study 
to be reported here extends this line of re- 
search by using both quantity and quality 
models in an investigation of their main 
effects and their interactions with other 
factors. 

The study by Klein et al. was primarily 
concerned with the role of anxiety in learn- 
ing to formulate hypotheses. The specific 
hypothesis investigated was that the per- 
formance of anxious students would improve 
more than that of less anxious students as 
a result of the training with quantity 
models, The idea was that performance on 
such a task involves some self-censorship of 
ideas, especially by anxious people; that is, 
people are likely to have more ideas than 
they report because they do not write down 
ideas that they think aren’t “good enough.” 
The presentation of quantity models of 
“acceptable” responses, it was thought, 
would decrease the amount of censorship, 
thereby increasing the number of responses, 
and this effect would be greater for anxious 
than for less anxious individuals. The num- 
ber of responses did increase as a result of 
the treatment, but since the interaction of 
treatment and anxiety was not significant, 
the hypothesis that the treatment would be 
more effective for anxious subjects was not 
confirmed. 

An unexpected finding was that the rela- 
tionship of anxiety to the number of hypoth- 
eses produced was curvilinear; poorest 
performance was associated with a middle 
level of anxiety and best performance, with 


a low level of anxiety. This result is not 
predicted by drive theory and is contrary 
to Stennett’s (1957) suggestion that the re- 
lationship between performance and drive 
level is nonmonotonie with the shape of an 
inverted U. 

The purposes of the present study, then, 
were (a) to investigate the relationships of 
both quantity and quality scores on 
Formulating Hypotheses to measures of 
fluency and verbal ability, (b) to see i 
“quality” models would be successful in im- 
proving quality of performance, and (c) to 
attempt to replicate two earlier findings— 
the effects of “quantity” models on number 
of responses and the curvilinear relationship 
of anxiety to performance. 


METHOD 


Subjects 


The subjects in this study were 395 paid volun- 
teers, mostly freshmen, at two state colleges in 
Pennsylvania. Approximately half were males and 
half females. 


Measures 


Formulating Hypotheses. In the previous study 
by Klein et al. (1969), two scores based on Formu- 
lating Hypotheses were used as dependent varia- 
bles: number of hypotheses and number of accept- 
able hypotheses, and they were also used in the 
present investigation. Number of responses (For- 
mulating Hypotheses Score 1) is the total number 
of nonduplicate hypotheses written by a subject, 
and number of acceptable responses (Formulating 
Hypotheses Score 2) is a subset of Formulating 
Hypotheses Score 1 responses that includes only 
ideas previously judged by a panel to be “accepta- 


_ble.” This'score was used, in spite of its experimen- 


tal dependence on Formulating Hypotheses Score 
1, in order to make possible a more exact replica- 
tion of certain aspects of the previous study. 
Formulating Hypotheses Score 3 is a new score 
called average judged quality of the responses, It 
is the average of ratings made by two scorers 
(using a 9-point scale) of the quality of the re- 
sponses, quality being defined in a sense that was 
consistent with the instructions to the examinees 
who took the Formulating Hypotheses test. 
Formulating Hypotheses Score 4 is called aver- 
age scale value. It is another quality score, ob- 
tained by a method that makes it relatively in- 
dependent of such qualities as length, handwriting, 
or grammatical correctness. The method made use 
of a master list of the hypotheses written by stu- 
dents, as derived from a content analysis. A panel 
of judges made evaluations of the hypotheses on 
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this list, and scale values were assigned on the 
basis of these ratings. The scorer's task was merely 
to decide which listed hypothesis, if any, was simi- 
lar to the one being scored, and to record the num- 
ber of that listed hypothesis. The appropriate 
scale value was assigned to each response by com- 
puter. 

Formulating Hypotheses Score 5 is average 
number of words per response. It was included to 
provide some insight into the processes involved 
in modifying behavior by presenting models and 
the cues used by scorers in evaluating responses. 

An attempt to develop another score that would 
represent the rarity, unusualness, or originality of 
subjects’ responses was unsuccessful because of low 
reliability. 

Each of five Formulating Hypotheses scores was 
obtained separately for each of the seven Formu- 
lating Hypotheses items; this made it possible to 
obtain scores for subsets of items. Two subsets were 
used: (a) the first two items (which were com- 
pleted before any models were presented) and (b) 
the last five items (which were all susceptible to 
influences of the experimental treatments). The 
five Formulating Hypotheses scores based on Items 
1 and 2 are called Formulating Hypotheses pretest 
scores and were used as covariates in the statistical 
analysis. The five Formulating Hypotheses scores 
based on Items 3-7 were used as dependent varia- 
bles. Unit weights were used in obtaining the pre- 
test scores. The weights used in obtaining the five- 
item composite scores were their loadings on the 
first principal component resulting from a princi- 
pal-axes factor analysis of the intercorrelations of 
the five items. Five such factor analyses were done, 
one for each of the five Formulating Hypotheses 
scores. 

Consequences test. Consequences is one of the 
tests used by Guilford (1967) to measure divergent 
production. Each item presents a hypothetical sit- 
uation (eg., “What would be the results if people 
no longer needed or wanted sleep?”), and the task 
is to list possible consequences of that situation. 
Two scores were obtained, using Guilford’s scor- 
ing method: Consequences-Obvious and Conse- 
quences-Remote, representing Divergent Produc- 
tion of Semantic Units and Divergent Production 
of Semantic Transformations, respectively, in the 
structure-of-intellect model, These scores were used 
to measure transfer of experimental effects to 
another task involving divergent production and 
were treated as two more dependent variables, 

Cognitive tests. Two tests from the Kit of 
Reference Tests for Cognitive Factors (French, 
Ekstrom, & Price, 1963) were also used: Advanced 
Vocabulary, a 36-item multiple-choice synonyms 
test, and the Theme test, which requires the 
subject to write two themes, the score being 
merely the number of words written. These two 
tests measure the factors of verbal comprehension 
and ideational fluency, respectively, according to 
the French et al. manual. In the structure-of-intel- 
lect model, the Theme test represents Divergent 
Production of Semantic Units. 


Test anziety. The same inventory used in the 
previous study provided the test anxiety score; 
this scale contains items from Harleston’s (1962) 
measure of test anxiety and the items from the 
Alpert-Haber (1960) debilitating anxiety scale. 


Procedure 


The procedure used was very similar to that 
employed in the previous experiment, except that 
there were two experimental treatments instead of 
one, and the data were obtained in one long even- 
ing session rather than in three separate sessions. 

The experimental treatments consisted of pro- 
viding models of acceptable performance at the 
completion of each Formulating Hypotheses item 
(except the first) in the form of a list of hypotheses 
pertaining to that item that were supposedly 
written by college students. One treatment (the 
quantity treatment) was essentially the same as 
that used in the first study; it consisted of provid- 
ing a fairly long list (18-26) of “acceptable hy- 
potheses” illustrating ideas that the subject might 
have written in response to the preceding item, 
The other treatment (quality) was similar except 
that each list of acceptable hypotheses included 
only the best ideas, carefully worded. The quality 
list typically included only six or seven hypotheses, 
and they were somewhat longer than those used for 
the quantity models. 

Members of the control group received no 
models; instead, they were given various question- 
naires to occupy their time in what appeared to be 
a relevant way. 

The lists were intended to provide models of 
performance that subjects would try to emulate 
in one way or another. Both the quality and quan- 
tity materials are believed to show a rather strik- 
ing contrast to the work of most students: The 
quantity list contained many more responses than 
the average subject wrote, while the quality lists 
were noticeably superior in quality, both with re- 
spect to ideas and wording. In order to insure that 
the lists were read, the subject was instructed to 
study the list carefully, then to go back to his own 
list to make any revisions or additions he felt 
would improve the list. (The Formulating Hy- 
potheses answer sheet produced a copy of the 
subject’s responses. The original was removed be- 
fore the feedback materials were presented, and 
the revisions were made on the copy. Only the 
original was used in scoring.) 

All subjects from a given college were seated 
together in a large room, and all three treatment 
groups were handled simultaneously. Assignment 
of treatments to subjects was accomplished by 
handing to every third subject as he entered the 
room an envelope containing materials for one 
particular treatment. All documents were num- 
bered as shown in Table 1, and instructions were 
given by referring to document numbers. The 
document in use at a particular period was taken 
by the subject from the top of the pile in his en- 
velope and was placed at the bottom of the pile 
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TABLE 1 
SEQUENCE OF PRESENTATIONS 
Group 
Document Time 
number (in minutes) 
Control Quality Quantity 

1 Personality Inventory Untimed 

2 Advanced Vocabulary Test 8 

3 Theme Test 8 

4 Formulating Hypotheses practice item Untimed 

5 Formulating Hypotheses Item 1 10 

6 Formulating Hypotheses Item 2 10 

7 Questionnaire Quality models Quantity models is 

8 Formulating Hypotheses Item 3 10 

9 Questionnaire Quality models Quantity models 7 
10 Formulating Hypotheses Item 4 10 
11 Questionnaire Quality models Quantity models 7 
12 Formulating Hypotheses Item 5 10 
13 Questionnaire Quality models Quantity models 7 
14 Formulating Hypotheses Item 6 10 
15 Questionnaire Quality models Quantity models 7 
16 Formulating Hypotheses Item 7 10 
17 Consequences Test 20 


in the envelope when completed. Subjects knew 
that the materials were not identical for everyone, 
but they did not know the nature or purpose of 
the different treatments. 


Statistical Analysis 


Means, standard deviations, and intercorrela- 
tions of all variables were computed for all subjects 
combined and separately for the three experimen- 
tal groups. Reliabilities were computed where pos- 
sible. 

Because of the relatively large number of de- 
pendent measures, the method of analysis chosen 
was multivariate analysis of covariance using a 
computer program by Clyde, Cramer, and Sherrin 
(1966). There were seven dependent measures: the 
five Formulating Hypotheses scores based on 
Items 3-7, Consequences-Obvious and Conse- 
quences-Remote. Since there were small differences 
between the two colleges in student performance, 
one covariate was the dichotomy college attended, 
Other covariates were the five Formulating Hy- 
potheses pretest scores. These five scores were ap- 
propriately used as control variables whenever we 
were interested in change in performance on For- 
mulating Hypotheses (ie. when investigating ef- 
fects of treatments). 

The design factors were treatment, sex, vocabu- 
lary, ideational fluency, and test anxiety. However, 
it was not possible to use all five design factors in 
one analysis because some cell frequencies became 
too small. Instead, three separate multivariate 
analyses of covariance were performed with over- 
lapping factors. The three analyses employed the 
following design factors, with number of levels 
within each factor shown in parentheses: (a) 
Treatment X Sex X Ideational Fluency X Test 


Anxiety (3 X 2 X 2 X 3), (b) Treatment X Sex X 
Ideational Fluency X Vocabulary (3 X 2 X 3 X 2), 
and (c) Treatment X Sex X Vocabulary X Test 
Anxiety (3 x 2 X 2 X 3). Three levels of test 
anxiety and of ideational fluency were used in 
order to make possible the detection of nonlinear 
relationships. Each of the three designs was used 
once with college attended as the only covariate 
and once with the five Formulating Hypotheses 
pretest scores used as covariates in addition to 
college attended. 

The multivariate analysis of variance model 
employed in the analysis first removes variance at- 
tributable to the four-way interaction, then lower 
order interactions, and finally it deals with main , 
effects. Each main effect was computed so that it 
was orthogonal to all other main effects and all 
interactions. Thus the R? may be interpreted as 
the percentage of variance uniquely attributable 
to the factor under consideration. 


RESULTS 


Intercorrelations and Reliabilities 


Table 2 presents the means, standard 
deviations, intercorrelations, and reliabili- 
ties of the variables used as covariates and 
as design variables, using data for all sub- 
jects combined. Since the treatments may 
affect intercorrelations involving dependent 
variables, the intercorrelations of dependent 
variables are shown separately in Table 3 
for the three treatment groups as well as for 
the total group; Table 4 shows the correla- 
tions of the dependent variables with the 
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TABLE 2 
INTERCORRELATIONS AND RELIABILITIES OF INDEPENDENT VARIABLES 
Measure 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 
Correlation 
1. Sex* x .12 17 .26 .13 .22 .20 .05 .15 
2. Test anxiety .87 — —.16 .06 —.00 —.08 -—.4 3.-—.03 —.03 
3. Voeabulary -68° .10 .05 oad 15 .04 .14 
4, Ideational fluency 78° .34 .28 .06 .05 is 
5. FH Pretest Score 1 .54* .799  —.06 08 mee 
6. FH Pretest Score 2 .44* ay, +28 Pa 
7. FH Pretest Score 3 54 24 3t 
8. FH Pretest Score 4 .52 132 
9. FH Pretest Score 5 .72 
M and SD 
M 1.5 34.4 14.9 151.3 15.9 11.7 9.3 11.9 39.1 
SD 5 16.0 4.7 35.2 5.0 4.5 2.2 2.6 12.4 


Note. To be significantly different from zero; r must equal .10 (p < .05) or .13 (p « .01). Reliabilities 
are in the main diagonal. Abbreviation: FH = Formulating Hypotheses. N = 395. 


* Males = 1; females = 2. 


» Reliability is the average interitem correlation corrected for length. 
* Reliability is the correlation of separately timed halves corrected for double length. 


other variables for the three treatment 
groups and for the total group. 

Reliabilities of the various measures are 
shown in the main diagonals of Tables 2 and 
8. (Reliabilities are based on the total 
group.) The correlations between the two- 
item and the five-item tests are shown in 
Table 4; these may be thought of as alter- 
nate form reliabilities. These correlations 
are, of course, attenuated by the fact that 
one of the two tests contains only two items. 
Formulating Hypotheses Score 5 (average 
number of words) is the most reliable and 
Formulating Hypotheses Score 4 (average 
scaled value) the least reliable of the 
Formulating Hypotheses scores. While the 
two-item pretest is adequate for use as a 
control variable, a longer test would ob- 
viously have been better. The data in- 
dicate that highly reliable measures of 
Formulating Hypotheses performance can 
be built by using a sufficient number of 
items. 

Intercorrelations of Formulating Hy- 
potheses test scores (see Table 3) show that 
the Formulating Hypotheses Score 1 (num- 
ber of responses) and Formulating Hypoth- 
eses Score 2 (number of acceptable re- 


sponses) are highly correlated, as is to be 
expected since Formulating Hypotheses 
Score 2 is based on a subset of the responses 
contributing to Formulating Hypotheses 
Score 1. The two scores designed to measure 
quality—Formulating Hypotheses Scores 3 
and 4—are also highly correlated, in com- 
parison with their reliabilities, and Score 3 
has a substantial correlation with Formulat- 
ing Hypotheses Score 2 (number of accept- 
able hypotheses), which reflects quality as 
well as number of responses. The correlation 
between Formulating Hypotheses Score 3 
(average rated quality) and Formulating 
Hypotheses Score 5 (average number of 
words) suggests either that raters tend to be 
impressed by long responses or that a more 
lengthy response is necessary for higher 
quality. 

Since scores on the five-item Formulating 
Hypotheses test are influenced by the ex- 
perimental treatments, it is important to 
look at their intercorrelations and correla- 
tions with other variables separately for the 
three treatment groups. Table 3 includes the 
intercorrelations of the dependent variables 
for the three treatment groups. There were 
some differences in correlations that might 
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TABLE 3 


INTERCORRELATIONS OF DEPENDENT VARIABLES FOR TOTAL GROUP AND FOR THE 
THREE TREATMENT GROUPS 


Measure 1 | 2 | 3 | 4 | 5 | 6 | 7 
Correlation 
1. FH Score 1 .80* .72 —.18 .01 —.28 .28 .94 
-73 —.13 —.05 —.21 .94 .38 
-67 —.16 .03 —.08 -30 E 
.78 —.21 .04 —.30 30 29 
2. FH Score 2 .67 .43 81 —.07 +22 19 
i 47 29 —.02 26 .24 
49 40 .10 23 20 
39 27 —.22 21 15 
3. FH Score 3 .60 .58 .98 —.06 — 14 
57 .36 —.06 —.16 
59 40 —.04 —.20 
T 41 .21 —.08 —.07 
4. FH Score 4 48 14 —.04 —.05 
.18 —.06 —.11 
15 —.02 =.01 
.09 —.01 —.02 
5. FH Score 5 .87 —.09 —.06 
.10 —.08 
—.18 .02 
—.26 r2 
6. Consequences-Obvious .85 M 
20 
10 
7. Consequences-Remote 75 
M and SD 
M 29.3 18.9 14.4 17.4 62.4 42.8 16.7 
29.4 18.7 14.0 17.3 62.5 43.3 17.0 
27.1 18.0 14.8 17.4 65.6 44.2 16.7 
31.4 19.9 14.3 17.5 59.3 40.9 16.5 
SD 8.0 5.6 2.7 2.1 19.5 15.1 8.0 
1.9 5.9 2.8 2.2 22.0 14.0 7.6 
6.5 4.5 2.8 2.3 17.9 16.5 8.4 
8.8 6.0 2.5 1.9 17.8 14.7 8.0 


Note. The first entry in each cell is the correlation for the total group. The next three entries are the 
correlations for control, quality, and quantity treatment groups, in that order. The ns for the four 
groups are, respectively, 395, 134, 129, and 132. For the total group, r must equal .10 (p « .05) or .13 
(p < .01) to be significantly different from zero. For the treatment group, r must equal .17 (p < .05) or 
22 (p < .01) to be significantly different from zero. Abbreviation: FH = Formulating Hypotheses. 

» Reliabilities (shown in the diagonal) are the average item intercorrelation within the total group 
corrected for length by the Spearman-Brown formula. 
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TABLE 4 


CORRELATIONS OF INDEPENDENT VARIABLES WITH DEPENDENT VARIABLES FOR ToTAL GROUP 
AND FOR THE THREE TREATMENT GROUPS 


Dependent variable 
Independent variable FH FH FH FH FH Conse- Conse- E du 
Score Score Score Score Score juences- juences- 
1 2 3 4 5 bvious emote 
Sex -12 14 .06 .05 12 .21 —.07 1.5 5 
.00 -08 ll .08 .20 .38 —.07 1.5 5 
.19 .21 .08 | —.05 16 15 —.07 1.5 5 
.20 A7 | —.01 13 | —+04 12 — .08 1.5 5 
—.08 | —.05 .02 | —.05 
Test anxiety —.08 .05 —.07 34.4 16.0 
—.11 | —.06 -09 .20 .01 .06 —.27 32.3 | 17.4 
.06| —.05 | —.15 | —.11 | —.17 -10 .04 35.2 | 16.2 
—.01| —.16 | —.14 | —.09 | —.01 -00 .06 35.7 | 14.0 
.22 .18 13 .12 
Vocabulary 17 .05 .13 14.9 4.7 
.16 -23 14 ll +13 .08 .19 14.9 4.7 
.25 19 14 .12 .20 .05 .16 14.9 5.0 
.13 .24 .29 18 .02 .03 .03 15.0 4.5 
Ideational fluency .32 .23 | —.05 | —.08 .08 .43 .28 151.3 | 35.2 
.36 .28 .06 | —.06 15 AT .29 153.5 | 35.6 
.32 .28 | —.08 | —.12 .06 AT .30 | 148.1 | 36.9 
.29 .17 | —.12 | —.04 .02 37 .24 152.1 32.9 
FH Pretest Score 1 .51 .38 | —.10 .03 | —.15 .23 .30 15.9 5.0 
.62 .88 | —.16 .02| —.13 .26 .35 16.2 5.1 
.43 .39 .01 .08 | —.10 .21 .23 16.0 4.9 
.53 -41 | —.14 | —.01 | —.24 +22 .32 15.4 5.1 
FH Pretest Score 2 .42 .46 16 .14 | —.02 18 .20 11.7 4.5 
48 -41 .03 .12| —.03 .20 .20 11.9 4.3 
.39 .53 .30 .23 .07 .12 .16 11.8 4.6 
.45 -50 .15 .05 | —.11 .23 .24 11.4 4.5 
FH Pretest Score 3 .04 .30 .45 .23 .27 | —.01 —.07 9.3 2.2 
.06 .29 .36 .25 .24 .08 —.07 9.3 1.9 
-09 AL .54 .33 -35 | —.18 — .02 9.3 2.3 
OL .24 .45 ll .26 .05 —.12 9.4 2.4 
FH Pretest Score 4 .00 13 .26 .23 -16 .01 —.01 11.9 2.6 
—.07 .07 21 .30 -19 .01 .03 12.2 2.4 
—.03 +22 42 .33 17) —.09 —.12 11.6 3.0 
.08 13 .15 .00 12 14 .08 11.8 2.5 
FH Pretest Score 5 at —.01 -33 -13 14| —.04 —.04 39.1 | 12.4 
—.15 .03 .36 10 .80 .08 .01 39.1 | 12.2 
Bens 07 .32 .18 0| —.13 —.06 39.4 | 13.1 
Ti ht .30 -10 4| —.06 —.08 38.7 | 12.0 


Note. The first entry in each cell is the corre! 
correlations for control, quality, and quanti: 
groups are, respectively, 395, 134, 129, and 13 
(p < .01) to be significantly different from ze 
22 (p < .01) to be significantly different from zero. Abbreviation 


ro. For the treatment 


lation for the total group. The next three entries are the 


ity treatment groups, in that order. The ns for the four 
2. For the total group, 


r must equal .10 (p < .05) or .13 


group, r must equal .17 (p « .05) or 
: FH = Formulating Hypotheses. 
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be attributable to treatments. For example, 
correlations involving Formulating Hypoth- 
eses Score 5 (number of words) were ap- 
parently reduced or made negative by the 
quantity feedback. However, the differences 
are not great, and the pattern is generally 
the same for all three groups. The correla- 
tions with independent variables (see Table 
4) do not differ greatly for the three groups. 
It is therefore concluded that factor struc- 
ture was not substantially altered by the 
treatments and multivariate analysis of 
covariance results would, from this point of 
view, be interpretable. 


Canonical Variates 


Table 5 shows the correlations of the 
dependent variables with the four signifi- 
cant canonical variates (out of six com- 
puted) that were obtained by using college 
attended and the five Formulating Hypoth- 
eses pretest scores as covariates and with- 
out including any of the design factors. The 
canonical correlations (shown in the row 
labeled R) are correlations between the best 
weighted combination of dependent vari- 
ables and the best weighted combination of 
covariates. These canonical variates are 
orthogonal, and they result from a step- 
down model (variance attributable to the 
first canonical variate is removed before 
computing the second, etc.). The correla- 
tions of dependent variables with the 
canonical variate may be used like factor 
loadings to interpret the canonical variates. 

The first canonical variate was obviously 
defined by Formulating Hypotheses Score 
5 (average number of words). The canonical 
correlation between covariates and the de- 
pendent variables was high, about .75, no 
doubt in part because of the higher reli- 
ability of Formulating Hypotheses Score 5, 
which correlates about .98 with the canoni- 
cal variate. The only other dependent vari- 
able with appreciable correlations was 
Formulating Hypotheses Score 3, the aver- 
age quality rating. These correlations sug- 
gest that there may be a tendency for raters 
to give higher ratings to the longer re- 
sponses, or possibly, longer responses are 
necessary for statements of high quality. 

Canonical Variate 2/was defined mainly 
by Formulating Hypotheses Scores 1 and 2 


(number of hypotheses and number of 
acceptable hypotheses). Sizable correla- 
tions also occurred for Consequences-Re- 
mote, but not for Consequences-Obvious, 
which suggests that the quantity of produc- 
tion on the Formulating Hypotheses test 
involves divergent production of semantic 
transformations more than semantic units. 

Canonical Variate 3 was mainly cor- 
related with Formulating Hypotheses Score 
3, the average quality rating. Other sub- 
stantial positive correlations were found for 
Formulating Hypotheses Score 2 (number 
of acceptable responses) and Formulating 
Hypotheses Score 4 (average scale value) 
reflecting the quality component in both 
these scores. In contrast to Canonical Vari- 
ate 2, the correlations with Consequences- 
Remote were negative, suggesting that the 
Formulating Hypotheses quality scores were 
quite different from the quantity scores 
with respect to the influence of fluency. 

The last canonical variate, which was 
barely significant, was correlated mainly 
with Formulating Hypotheses Score 4, the 
average scale value. Formulating Hypoth- 
eses Score 4 is the least reliable Formulat- 
ing Hypotheses score, and a good deal of 
the variance attributable to it had already 
been allocated to Canonical Variate 3. Thus 
there appear to be three major components 
in the domain of the dependent variables: 
length of responses, number of responses, 
and quality of responses. 

When college attended was used as the 
only covariate, the canonical variate was 
significant (p < .01), showing that college 
attended was significantly related to the 
dependent measures; but the canonical cor- 
relation was only about .25. The correla- 
tions with the canonical variate may be 
interpreted as showing that for some reason 
one college was superior with regard to the 
quality score (Formulating Hypotheses 
Score 3), and the other was superior on 
number of hypotheses (Formulating Hy- 
potheses Score 1) and on Consequences- 
Remote. 


Multivariate Analysis 


Tables 6, 7, and 8 present the salient find- 
ings of six multivariate analyses of covari- 
ance. The analyses differed with respect to 
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EFFECTS OF MODELS ON FORMULATING HYPOTHESES 


TABL 


" 


E 6 


RESULTS OF THE MULTIVARIATE ANALYSIS OF COVARIANCE FOR TREATMENT, SEX, IpEATIONAL FLUENCY, 
AND Test ANXIETY 


Multivariate for 
Effect significance M Correlation with canonical variate univariate 
level tests 
FH Score 1 .76 | .001 
Treatment* F = 4377 Control group .03 | FH Score 5 —.46| .008 
p< .001 Quality treatment — — .47 | FH Score 2 .46 | .003 
R=  .369 Quantity treatment — .44 Consequences-Obvious —.31 | .065 
FH Score 3 —.23 | .030 
Consequences-Remote —.49 | .013 
Sex? F = 3.637| Male —.27 | FH Score 2 .45 | .023 
p< .001 Female .27 | Consequences-Obvious .44 | .024 
= .260 FH Score 5 .43 | .030 
Consequences-Obvious  .82| .001 
Ideational fluency? | F = 13.654| Low —.47 | FH Score 1 .51| .001 
p< .001 High .47 | Consequences-Remote — .48 .001 
R= .402 FH Score 2 .93 | .001 
Consequences-Remote  .57 | .033 
Test anxiety? = 1.461| Low .26 | FH Score 2 .50| .104 
p< .120| Middle .01 | FH Score 5 .86 | .259 
=  .220 High —.27 | FH Score 1 .31 .337 
TreatmentX Test = 1.274 
Anxiety interac- | p< -155) FH Score 4 — | .014 
tione exea: 


a College attended and the five Formulating Hy pot; 
* College attended used as the only covariate. 


the combination of design factors employed, 
as was described previously, and with re- 
spect to the covariates used (college 
attended only or college attended and 
the five Formulating Hypotheses pretest 
scores). Table 6 shows the results for the 
design factors treatment, sex, ideational 
fluency, and test anxiety; Table 7, the 
results for treatment, sex, ideational fluency, 
and vocabulary; and Table 8, the results 
for treatment, sex, vocabulary, and test 
anxiety. For treatment, and for interactions 
involving treatment, the results are reported 
only for the analyses in which Formulating 
Hypotheses pretest scores as well as college 
attended are used as covariates. For the 
remaining design factors and interactions, 
results are reported for analyses in which 
college attended is the only covariate. The 
Formulating Hypotheses pretest scores are 
used as covariates for treatment effects 
because we wish to use a/ measure of change 
in evaluating the effects of the quality and 


heses (FH) pretest scores used as covariates. 


quantity models. Results for all the main 
effects are reported, but results for inter- 
actions are reported only if the significance 
level reaches p. < .05 for either multivariate 
or univariate tests. The results shown for 
the main effects were computed in such a 
way that R? can be interpreted as the per- 
centage of variance uniquely attributable 
to a particular factor. 

A word about the contents of Tables 6, 
7, and 8: The first column indicates what 
main effect (or interaction) is described in 
the corresponding row of the table. The 
second column shows the overall multivari- 
ate F ratio and its related p and R values. 
(The second canonical variate was not sig- 
nifieant in any instance.) Except in the 
case of an interaction, mean scores on the 
first canonical variate for the appropriate 
subgroups are shown in the next column. 
(The grand mean is set at zero.) The next 
column shows the salient correlations of 
dependent variables with the first canoni- 
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TABLE 7 


RESULTS OF THE MULTIVARIATE ANALYSIS OF COVARIANCE FOR TREATMENT, SEX, 
IDEATIONAL FLUENCY, AND VOCABULARY 


i b (f 

Effect "renlicance M Correlation with canonical variate uniyariate 
FH Score 1 .74 | .001 
Treatment* F = 4.455 | Control group .04 | FH Score 2 .45 | .003 
p< .001| Quality treatment —.46 | FH Score 5 J —.44| .004 
R = .871 | Quantity treatment — .42 | Consequences-Obvious —.32| .061 
FH Score 3 —.22| .029 
Consequences-Remote —.54 .009 
Sex? F = 3.286 | Male —.24 | Consequences-Obvious  .53| .011 
p< .002| Female .24 | FH Score 2 .35 | .094 
R= .248 FH Score 5 .93 | .115 
Consequences-Obvious  .81| .001 
Ideational fluency | F = 8.459 | Low —.65 | FH Score 1 .51 | .001 
p< .001 | Middle —.05 | Consequences-Remote .46 | .001 
R= .497 | High .70 | FH Score 2 .80 | .005 
FH Score 3 .80 .001 
Vocabulary” F = 4.827 | Low —.28 | FH Score 4 .66 | .001 
p< .001 | High .28 | FH Score 2 .58 | .001 
= ,296 FH Score 5 .51 | .008 
FH Score 2 64 | .032 
Sex X Vocabulary | F = 1.598 FH Score 4 .61| .041 
interaction> p< .185 FH Score 3 .60 | .043 
R= M6 Consequences-Remote ^ .45 | .132 
Consequences-Obvious —.33 | .271 


* College attended and the five Formulating Hypotheses (FH) pretest scores used as covariates. 


^ College attended used as the only covariate. 


cal variate; this information makes clear 
what constitutes each canonical variate and 
thus which dependent variables contribute 
most to the means shown in the preceding 
column. The last column shows the p values 
for univariate tests for the dependent vari- 
ables shown in the preceding column. An 
entry was made in the last column if (a) 
r > .30 for the canonical variate or (b) 
p < .05 for the univariate significance level. 

Results for treatment, sex, ideational 
fluency, and test anxiety. The largest R in 
Table 6 is .46 for ideational fluency, whose 
relationship to the first canonical variate 
was highly significant (p < .001). The cor- 
relations of dependent variables with the 
canonical variate show that ideational 
fluency was related mainly to Consequences- 
Obvious but also to Consequences-Remote 
and Formulating Hypotheses Scores 1 and 
(to some extent) 2. Univariate tests were 


significant for all these variables. The re- 
sults support Guilford’s (1967) placement 
of both the Theme test and Consequences- 
Obvious in the same cell of the structure-of- 
intellect model, and they also show that the 
number of hypotheses score has a large com- 
ponent of divergent production of semantic 
units. 

The second-largest R is .37, for treatments. 
The five Formulating Hypotheses pre- 
test scores as well as college attended were 
used as covariates for this design factor. 
(The effects of treatments were greater— 
R = 37 as compared with 31—and more 
clearly focused when these six covariates 
were used than when only college attended 
was used as a covariate, while the pattern 
of performance remained the same.) Quan- 
tity treatment improved performance as 
measured by Formulating Hypotheses Score 
1 (number of hypotheses) and Formulating 
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TABLE 8 


RESULTS OF THE MULTIVARIATE 
TREATMENT, SEX, VOCABUL. 


ANALYSIS OF COVARIANCE FOR 
ARY, AND TEST ANXIETY 


Multivariate p (for 
Effect significance M Correlation with canonical variate | univariate 
level d) 
FH Score 1 .80| .001 
Treatment* F = 4.162 | Control group .01 | FH Score 2 .48 | .002 
p< .001 | Quality treatment — .46 | FH Score 5 —.48 | .002 
R= .363 | Quantity treatment .45 | FH Score 3 —.26| .027 
Consequences-Obvious  .07 | .001 
Sex? F = 4.742 | Male —.80 | FH Score 2 .46 | .008 
p< .001| Female .30 | FH Score 1 .40 | .020 
R- .204 FH Score 5 .94| .049 
FH Score 3 .78 | .001 
Vocabulary” F = 5,128 | Low —.81 | FH Score 4 .65 | .001 
p< .001|High .81 | FH Score 2 .58| .001 
R= .304 FH Score 5 .46 | .005 
Consequences-Remote ^ .59 | .032 
Test anxiety^ F = 1.415 | Low .25 | FH Score 2 .42| .219 
p< .140 | Middle —.02 | FH Score 1 .32| .824 
- .214| High —.23 | FH Score 4 —.80| .336 
Consequences-Obvious —.53 | .026 
Sex X Vocabulary | F = 2.498 FH Score 2 .52 | .029 
interaction» p< .016 FH Score 3 „51 .033 
R= .218 FH Score 4 .50| .037 
Treatment X Test | F = 1.261 
Anxiety interac- | p < .164 FH Score 4 — | .012 
tion* R= .216 


* College atten: 


» College attended used as the only covariate. 


Hypotheses Score 2 (number of acceptable 
hypotheses) , while quality treatment tended 
to produce fewer, longer, and somewhat 
better responses. The univariate tests for 
the four Formulating Hypotheses scores 
were all significant, including those for 
Formulating Hypotheses Scores 3 and 5. A 
one-way analysis of variance, with Formu- 
lating Hypotheses Score 3 as the dependent 
variable and with college attended and 
Formulating Hypotheses Pretest Score 3 as 
covariates, showed that mean performance 
of the quality treatment group on Formulat- 
ing Hypotheses Score 3| was significantly 
higher than that of the control group [p oos 
7.34, p < 007). Thus both quality and 
quantity treatments produced the expected 
changes in performance. 

The R for sex is. 26. Females were supe- 


ded and the five Formulating Hypotheses (FH) pretest scores used as covariates. 


rior to males on a canonical variate that is 
positively correlated with Consequences- 
Obvious and Formulating Hypotheses 
Scores 2 and 5 and negatively correlated 
with Consequences-Remote. Thus females 
were found to be superior with regard to 
performance on tests that reflect number of 
hypotheses, number of words, and number 
of obvious consequences, but they were 
poorer on remote consequences. 

The multivariate F for test anxiety was 
not significant (p < .12), although the uni- 
variate test for Consequences-Remote was 
significant (p < .03). 

The multivariate analysis of covariance 
test for a Treatment x Anxiety interaction 
yielded a nonsignificant F. However, the 
univariate test for Formulating Hypotheses 
Score 4 was significant (p < .014). Exami- 
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nation of the nine cell means for Formulat- 
ing Hypotheses Score 4 shows a tendency 
for low-anxious subjects in the treatment 
groups to outperform low-anxious subjects 
in the control group, while the opposite was 
true for high- and middle-anxious subjects. 
Thus either quantity or quality treatment 
appears to benefit low-anxious subjects 
more than middle- or high-anxious subjects. 

Results for treatment, sex, ideational flu- 
ency, and vocabulary. Table 7 reports the 
multivariate analysis of covariance results 
for four design variables that include vo- 
cabulary. Verbal ability, as measured by the 
Vocabulary test score, was significantly re- 
lated (R = 30, p < .001) to a canonical 
variate that clearly reflects quality of per- 
formance on Formulating Hypotheses and 
a tendency to write long responses. The 
univariate tests were all significant, and the 
relationship to verbal ability was positive, 
as is shown by the means. 

The results for treatments, sex, and idea- 
tional fluency were all quite similar to those 
shown in Table 6, except that in the case of 
sex the univariate tests were not significant 
for Formulating Hypotheses Scores 2 and 5. 
Inclusion of vocabulary as a design variable 
has apparently removed some of the vari- 
ance that was attributed to sex in the 
analyses reported in Table 6. 

Univariate tests show evidence of an in- 
teraction of sex and vocabulary involving 
Formulating Hypotheses Scores 2, 3, and 4, 
the measures reflecting quality of perform- 
ance on Formulating Hypotheses. The 
means on Formulating Hypotheses Score 2 
in the 2 X 2 interaction table show that fe- 
males generally earned higher scores, but 
the difference between males of high and 
low verbal ability was much greater than 
that between females of high and low verbal 
ability. 

Results for treatment, sex, vocabulary, 
and test anxiety. Table 8 presents results 
for the third combination of design varia- 
bles, which provides an opportunity to see if 
there is a Vocabulary x Anxiety interaction. 
None was found. The Sex x Vocabulary 
interaction did appear again, and in this 
analysis the multivariate test was signifi- 
cant (p < .016) as were the univariate 


tests. The univariate test for the Treatment 
X Anxiety interaction was again significant 
for Formulating Hypotheses Score 4. Gen- 
erally speaking, results for main effects were 
very similar to those found in the other two 
analyses except for some differences in de- 
tails of the canonical variate for sex, which 
are attributable to the variations in design 
factors. 


Discussion 


The effects of treatment on change in per- 
formance were shown to be highly signifi- 
cant; the proportion of variance in the ca- 
nonical variate accounted for by treatment 
is about .13 when pretest scores are used as 
covariates. The effect of quantity treatment 
was basically to increase the number of 
Formulating Hypotheses responses and de- 
crease the average number of words per re- 
sponse, while the effect of the quality treat- 
ment was to increase the average number of 
words per response, increase the rated qual- 
ity of responses, and decrease the number of 
responses. The possibility of improving 
quality, as well as quantity, of creative per- 
formance by using models is thus demon- 
strated. The effect of the quality treatment 
is smaller, as judged by correlations of 
Formulating Hypotheses scores with the 
canonical variate; but a separate one-way 
analysis of variance confirms the finding 
that the quality models result in responses 
of higher quality. The change in perform- 
ance tends toward a literal copy of the 
models in terms of number, length, and 
quality of responses. No evidence of trans- 
fer of training to the Consequences test was 
found. 

Ideational fluency was found to account 
for a relatively large proportion of the vari- 
ance in the domain of dependent variables 
(R* = .23), and the other ability measure 
employed, the Vocabulary test, was also 
Significantly related to performance (RZ 
09). These two ability measures predict 
quite different aspects of performance: Ide- 
ational fluency is related to quantity of pro- 
duction (especially the Consequences test 
scores and Formulating Hypotheses Score 
1), while vocabulary is related to quality of 
performance. A definition of creativity in 


Na BEER RD 


EFFECTS OF MODELS ON FORMULATING HYPOTHESES 81 


terms of fluency would appear to be correct 
only if the quality of creative performance 
were ignored. 

Sex was also found to be a significant fac- 
tor (R? 2 .07), although the proportion of 
variance contributed depends somewhat on 
which other design factors are included in the 
analysis (since sex is significantly correlated 
with anxiety, vocabulary, and especially, 
ideational fluency). Females were gen- 
erally superior on a canonical variate that 
correlates positively with Consequences-Ob- 
vious and Formulating Hypotheses Scores 
2 (number of acceptable hypotheses) and 5 
(average number of words). 

Test anxiety accounted for an insignifi- 
cant proportion of the variance (R? = .04); 
high anxiety was associated with poorer 
performance on Consequences-Remote. 
None of the univariate tests were significant 
for any Formulating Hypotheses score, and 
there was no evidence of a nonlinear rela- 
tionship to the canonical variate. 

A salient finding of the earlier study was 
a U-shaped relationship between test anxi- 
ety and number of hypotheses. A possible 
reason for the failure to replicate the curvi- 
linear relationship is that the relationship 
of performance to anxiety is different for 
males and females. (The previous study 
used only male subjects.) Although a sig- 
nificant Anxiety X Sex interaction was not 
found, a plot (not shown) of Formulating 
Hypotheses Score 1 (the variable involved 
in the earlier study) against the three levels 
of test anxiety showed that the same U- 
shaped curve existed for male subjects, 
while for females the curve was more or 
less linear and descending (high anxiety 
associated with fewer hypotheses). Thus 
the data at least suggest that the relation- 
ship for males is similar to that found previ- 
ously and that sex differences exist. 

Weak evidence of a Treatment X Anxiety 
interaction was found involving Formulat- 
ing Hypotheses Score 4 (average scale 
value). Since the hypothesis that motivated 
the original study (Klein et al., 1969) was 
that anxious individuals would profit more 
from the treatments than less anxious peo- 
ple, and that hypothesis was not then con- 
firmed, the finding of even a weak Treat- 


ment x Anxiety interaction is of interest— 
even though Formulating Hypotheses Score 
4 was involved rather than Formulating 
Hypotheses Score 1. A plot (not shown) of 
treatments against Formulating Hypotheses 
Score 4 means for the three levels of test 
anxiety showed relationships completely 
unlike those predicted for Formulating Hy- 
potheses Score 1. The plot shows that for 
the control group, quality of performance 
was poorest for low-anxiety individuals, 
while for both treatment groups perform- 
ance was higher for low-anxiety subjects. 
At higher levels of anxiety, the performance 
of control group members was superior to 
that of the treatment groups. Thus the re- 
lationship was the opposite of what had 
been predicted for Formulating Hypotheses 
Score 1. However, since Formulating Hy- 
potheses Score 1 is a quantity score and 
Formulating Hypotheses Score 4 a quality 
score, the results may not be entirely in- 
consistent with the original hypothesis. 

A weak Sex x Vocabulary interaction 
was also found, the correlates of the canoni- 
cal variate including quality measures 
(Formulating Hypotheses Scores 2, 3, and 
4) and Consequences-Remote. Examination 
of appropriate plots showed that perform- 
ance of females was generally superior, but 
there was relatively little difference be- 
tween males and females of high ability 
while low-ability females were much supe- 
rior to low-ability males. 

Formulating Hypotheses appears to pos- 
sess appropriate psychometric properties for 
further explorations in the realm of creative 
performance. It possesses a certain amount 
of face validity, the items being concerned 
with interpretation of real data obtained in 
various kinds of scientific undertakings, and 
therefore may possess certain advantages 
over such tests as “brick uses” and “conse- 
quences.” The scores so far developed are 
reasonably adequate from the standpoint of 
reliability, and the interitem correlations 
are sufficiently high that one could build a 
test of almost any reliability he desires by 
increasing the number of items. The span of 
abilities covered by the present five scores 
appears to include quantity of performance, 
quality of performance, and length of re- 
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sponses. It would be desirable to add scores 
measuring rarity or originality of responses. 
The study provides some evidence of the 
construct validity of the test, since the 
scores generally relate to other measures 
and to treatments in ways that are logical or 
in accordance with theoretical expectations. 
The use of tests like Formulating Hypoth- 
eses may be useful as provisional criterion 
measures in investigations of scientific cre- 
ativity—its trainability, influences of situ- 
ational faetors on it, and the cognitive, atti- 
tudinal, and temperamental characteristics 
associated with it. 

The results were, in general, consistent 
with our armchair analysis of the processes 
involved in formulating hypotheses. The 
Formulating Hypotheses quantity scores 
were related to fluency and the quality 
scores to verbal ability (as measured by a 
vocabulary test). The relationships of For- 
mulating Hypotheses scores to the experi- 
mental treatments were significant and in 
the expected directions. These relationships 
may be interpreted as showing that varia- 
tions in creative performance may be in- 
fluenced by altering subjects’ standards as to 
what constitutes satisfactory performance. 
(Another possible interpretation is that the 
treatments changed ability, but the more 
parsimonious and intuitively appealing hy- 
pothesis is that presenting models of satis- 
factory performance altered standards 
rather than ability. It is unlikely that 
cognitive abilities can be substantially 
changed by a small amount of training that 
merely involves presentation of models.) 


The lack of significant relationships of 
Formulating Hypotheses scores to test anx- 
iety scores makes less tenable the notion 
that amount of self-censorship of ideas is a 
function of anxiety. 
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The effects of elaborative prompts on noun-pair learning in children 
were examined. The effects of these prompts were compared under 
two experimental designs—between subjects or within subjects. Of 
the total sample of 194 third-grade children, half were drawn from 
a middle-class and half from a lower-class population. Every child 
was administered perceptual and verbal ability tasks as well as a 


.paired-associate task consisting of 


a list of 25 noun pairs presented 


in one of seven different ways. The results revealed that estimates 
of prompt effects vary with both design type and with population. 
Correlations among performances measured on the various tasks 
were generally higher for the lower-class than for the middle-class 


samples. 


A general finding has been that middle- 
class children, on the average, obtain higher 
scores on standard IQ tests than do lower- 
class children (e.g, Janke & Havighurst, 
1945; Schaie, 1958). In contrast, there is 
evidence that such social class differences 
do not result when other tasks are used to 
index learning ability. For example, Semler 
and Iscoe (1963) compared the perform- 
ance of black and white subjects across age 
levels of five through nine years. While they 
found significantly lower full scale Wechsler 
Intelligence Scale for Children (WISC) IQs 
for the black sample, they found that on 
paired-associate tasks, the performance of 
the white children was superior to that of 


1This research was supported in part by 
National Institutes of Health Grant HD03869. 
The report was prepared at the Institute of Human 
Learning, which is supported by grants from the 
National Institutes of Health. 

The authors wish to acknowledge the assistance 
of Ron Karlsberg in collecting the data for the 
study and the cooperation of the principals, 
teachers, and children of the Peres, Lake, El 
Monte, Del Mar, and Madera schools in making 
the study possible. 

2 Requests for reprints should be sent to 
William D. Rohwer, Jr., Institute of Human 
Learning, University of California, Berkeley, 
California 94720. 


the black children only at the five- and six- 
year-old age levels. 

In an attempt to explicate this finding, 
Rohwer, Lynch, Levin, and Suzuki (1968) 
tested first-, third-, and sixth-grade children 
from high- and low-stratum schools on one 
or another of four versions of a paired-as- 
sociate task. The study was specifically 
concerned with conditions that had been 
found to facilitate paired-associate learning 
in young children and with the effects of 
these conditions on social class differences. 
It was found that the use of connecting 
sentences between noun pairs, as compared 
with the nouns alone or the use of connect- 
ing phrases between the noun pairs, facili- 
tated learning for all subjects. Similarly, 
the use of action pictures, as compared to 
still pictures, led to better performance. 
The amount learned by older subjects was 
greater than that learned by younger sub- 
jects regardless of school stratum. In all 
conditions, low-stratum children learned as 
efficiently as high-stratum children. 

A second approach, by means of a within- 
subjects design, has also been used to exa- 
mine the effects of learning conditions in 
connection with social class: differences. 
Rohwer, Ammon, Suzuki, and Levin (1971) 
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selected samples of low-socioeconomic-sta- 
tus black and high-socioeconomic-status 
white children at the kindergarten, first- 
and third-grade levels. All subjects were 
administered the Peabody Picture Vocabu- 
lary Test, Children's Progressive Matrices, 
and a series of paired-associate-learning 
tasks. Each paired-associate list was com- 
posed of five item types representing differ- 
ent amounts of prompting for elaboration. 
In general, the results revealed large popu- 
lation differences on both IQ tests at- all 
grade levels (the largest difference was be- 
tween the third-grade samples), but the 
paired-associate measures summed over 
item types revealed a population difference 
only at the kindergarten level. Performance 
on the paired-associate items improved as 
a function of elaborative prompt conditions. 
Population differenees in paired-associate 
learning varied across prompt conditions. 
Such differences were larger when nouns 
were presented alone or when still pictures 
were presented alone than when nouns and 
still pictures were presented together. In 
fact, the mean differences on nouns plus 
still picture items in both first grade and in 
third grade favored the low-socioeconomic- 
status black samples. 

Despite the apparent similarity of out- 
come revealed by the between- and within- 
subjects designs, a difficulty arises when 
one wishes to generalize solely from the 
mixed-list results. Mallory (1972) has sug- 
gested that an estimate of the relative ef- 
fect of a given prompt condition based only 
on data from a mixed list may be mislead- 
ing. He found that when a mixed list was 
used, second-grade children recalled more 
items of a particular type (when the re- 
maining items were presented with elabora- 
tive prompts) than they did when the same 
item type was presented in a pure list. He 
further suggested that the mixed list may 
serve to differentiate children in terms of 
the strategies they employ in learning a 
list; for example, some children may be 
visualizers and tend to concentrate on the 
visually prompted (action) items, while 
others may be verbalizers and tend to con- 
centrate on the verbally prompted (sen- 
tence) items. Accordingly, one purpose of 
the present study was to clarify the issues 


raised by the use of a within-subjects de- 
sign to investigate elaborative prompt ef- 
fects and population differences. 

A second purpose of the present research 
was to gain more information about the 
relationship between standard ability mea- 
sures and performance on a paired-associate 
task. There is evidence from studies of 
adults that the magnitude of correlation 
between performance on memory tasks and 
performance on specific ability tests is 
dependent upon the characteristics of the 
memory task (Frederiksen, 1969). In the 
present study it is predicted that larger 
correlations would be observed between 
congruent learning and ability measures 
than between incongruent ones. Specifically, 
larger correlations were expected between 
verbal tests and verbal paired-associate 
items and between a perceptual test and 
visual paired-associate items than for the 
remaining combinations of measures. 

The ability measures were also of con- 
cern in connection with the issue of popula- 
tion differences. Previous evidence suggests 
that there is a greater degree of differentia- 
tion of abilities in middle-class school-age 
children than in lower-class school-age 
children (e.g. Dockrell, 1965; Mitchell, 
1956). In the present study, it was predicted 
that the intercorrelations among the ability 
measures and that the correlations between 
the ability measures and paired-associate 
learning would be greater for the lower-class 
children than for the middle-class children. 
This prediction was derived from the hy- 
pothesis that the lower degree of differentia- 
tion previously observed in lower-class 
children is due to the relatively limited 
range of cognitive strategies they employ. 

This hypothesis is consistent with the 
position espoused by Ferguson (1954) and 
developed by Frederiksen (1969). It is as- 
sumed that middle-class children have more 
opportunities to work on tasks that are in 
some ways similar to the tasks in the pres- 
ent study. Thus, they would be more likely 
to have developed specific strategies for 
selective use in varying situations. Because 
of this, the intercorrelations of their per- 
formances on the various tasks should be 
lower than those of the lower-class children. 


: 
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METHOD 


Subjects 


One hundred and ninety-four third-grade 
children served as subjects. The children came 
from five schools in a local school district, a 
district serving several diverse communities. On 
the basis of available census tract information, 
two schools that served lower-class areas were 
selected, and three schools that served middle- to 
upper-middle-class areas were selected. Ninety- 
seven children were randomly chosen from each 
of the two populations. Of the children in the 
lower-class sample, 60 were black, 32 were from 
white English-speaking families, and 5 were of 
Spanish-American descent but were fluent 
speakers of English. Of the 97 middle-class chil- 
dren, 74 were white, 13 were black, and 10 were 
Oriental. In the lower-class sample, 72 parental 
occupations could be rated on the Hollingshead 
scale (Hollingshead & Redlich, 1958). Of these 72, 
all fell within or below the fourth category (owners 
of small business, clerical and sales workers, and 
technicians) with only 11 occupations falling 
within that particular category. In the middle- 
class sample, 94 parental occupations could be 
rated on the Hollingshead scale. Of these, 38 fell 
within the first category (executives and pro- 
prietors of large concerns and major professionals) 
and all but 18 fell within the first three cate- 
gories. 

The mean age of the lower-class sample was 
9.0 years with a standard deviation of 5.45 months; 
the mean age of the middle-class sample was 8.96 
years with a standard deviation of 4.77 months. 


Design 

Ability tests. Children from the two social class 
groups were given two parts of the Primary 
Mental Abilities for Grades 2-4 (Thurstone, 1963), 
the perceptual speed test and the verbal meaning 
test. The verbal meaning test consists of two 
parts, vocabulary and sentence completion. 

Paired-associate-learning task. One group of 
children from each social class group was tested 
under a mixed-list condition; that is, they were 
given a paired-associate list containing items 
representing five conditions that differed in the 
amount and type of elaborative prompting pre- 
sented within the item. Six other groups from 
each social class were each tested under one of 
six treatment conditions consisting of a paired- 
associate list containing items, all of which 
represented one particular condition of elabora- 
tive prompting. 


Materials 
Each paired-associate list consisted of the 
` game 25 pairs. The 25-item mixed list was com- 
posed of 5 item types of 5 pairs each, chosen to be 
consistent with Rohwer et al. (1971). The types 
represented the following methods of presenta- 
tion: (a) nouns (spoken nouns, no pietures); (b) 
still pictures (no verbalization); (c) nouns-still 
(a combination of a and b); (d) sentence-still (the 


two nouns of the paired-associate sequence formed 
the subject and object of an action verb; for 
example, the celery climbed the stairs), and (e) 
nouns-action (in action sequences, the objects 
were shown involved in the actions described by 
the sentences; for example, the celery was shown 
moving up the stairs). The order of the pairs in 
the list was random with respect to item type 
with the restriction that all types be represented 
once in each sequence of five pairs. 

The other six conditions (the independent 
group conditions) involved the presentation of 
one of six pure lists; that is, each list was com- 
posed exclusively of items of a single type. Five 
of these conditions consisted of the five methods 
of presentation employed in the mixed-list condi- 
tion. The additional independent group was a 
sentence-action condition, a condition that 
combined the two elaborative prompting tech- 
niques of sentence descriptions and action-picture 
realizations of those descriptions. 

The paired-associate materials were recorded 
on videotape. All verbalization was presented in a 
male voice. 


Procedure 


Fourteen subjects from each of the social class 
groups were tested under each of the six inde- 
pendent treatment conditions, and 13 under the 
mixed-list condition. 

‘All children were initially given two parts of 
the Primary Mental Abilities tests. The tests 
were administered in individual classrooms with 
approximately 30 children per class. The per- 
ceptual speed test was administered prior to the 
administration of the verbal meaning test. 

The paired-associate task was administered 
individually. In all conditions the child was 
seated in front of a television monitor. Subjects 
were told that they would be asked to provide the 
name of the missing object when shown or told 
the other object that had originally appeared 
with it. Four example pairs were presented on the 
sereen, followed by a test trial. Instruetions and 
examples were repeated if the subject did not 
comprehend the task. The appropriate 25-item 
list then followed. The lists were presented for 
two study trials, each of which was followed by & 
test trial. Presentation time per stimulus pair was 
four seconds with a one-second pause between 
pairs. The same rate was used for the presentation 
of the individual items in the test trial. 

All testing was done by one of two experi- 
menters, both of whom were white, one male and 
one female. The experimenters were approximately 
balanced with regard to schools, teachers, and 
experimental conditions. 


RESULTS 
Social Class Effects on Ability and Learning 
Tasks 


The mean ability test results are found in 
Table 1. The two parts of the verbal mean- 
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TABLE 1 


Mean PERFORMANCE ON ABILITY TESTS AS A 
Function or PoPULATIONS 


Ability test 
Social class 
Vocabulary | Sentence, | Penta 
Lower 20.60 19.45 23.43 
Middle 26.08 24.21 28.14 


ing test are given separately as vocabulary 
and sentence completion. The correlation 
between these two test parts, for the entire 
sample, was .60. It was decided that a cor- 
relation of this magnitude did not justify 
combining the two parts into one test score. 
An analysis of variance of the three abil- 
ity test measures with the seven treatment 
conditions nested within social class was 
performed. Treatment conditions were in- 
cluded to ascertain whether there were 
sampling differences with respect to the 
ability measures for the different treatment 
groups. This analysis yielded an overall 
significant social class effect (multivariate 
F = 38.79, df = 3/178, p < 0001). The 
univariate tests revealed a significant social 
class effect for each of the three measures 
(vocabulary, F = 96.86, df = 1/180; sen- 
tence completion, F = 90.00, df = 1/180; 
perceptual speed, F = 21.73, df = 1/180, 
all ps < .0001). There were no significant 
treatment effects within either social class; 
thus, there is no evidence of selective sam- 
pling across the treatment conditions. 
Mean paired-associate-learning-task re- 
sults are found in Table 2. An analysis of 
variance of the paired-associate-learning 
data for the six pure-list conditions was 
performed with social class nested within 
treatment conditions. By multivariate test 
(in which the two trials were treated as 
variables) the effect of social class was 
significant (multivariate P = 2.36, df = 
12/320, p < .0066). Results of the associ- 
ated univariate tests, however, indicated 
that the difference was almost entirely 
located on Trial 2 (step down F = 3.49, 
df = 6/156, p < .0030) with no significant 
difference on Trial 1 (F = 1.28, df = 
6/156). A further breakdown of this analy- 
sis revealed only one treatment condition 
for which there was a significant social class 


effect. This was the nouns condition (multi- 
variate F = 4.36, df = 2/155, p < .0145). 
Again the locus of the effect was the second 
trial. 

A multivariate analysis of the mixed-list- 
learning data (10 variables—five item types 
on each of two trials) again revealed a 
significant overall social class effect, (multi- 
variate F = 2.72, df = 10/15, p < .0391), 
thus showing that the pattern of results 
was different for the two social class groups. 
Only one univariate test was significant, 
however. This was for the sentence-still 
items on the second trial. 


Condition Effects and Design Comparisons 
in Paired-Associate Learning 


A graph of the mean paired-associate- 
learning performance for both the inde- 
pendent groups and the mixed-list condition 
can be found in Figure 1. 

To analyze the effects of prompt condi- 
tions, the simple effects of conditions were 
assessed within social class. For the between- 
subjects design, the following contrasts be- 
tween group means were performed sepa- 
rately for each social class: (a) nouns versus 
still; (b) nouns-still versus still; (c) nouns- 
action versus nouns-still; (d) sentence-still 
versus nouns-still; and (e) sentence-action 
versus nouns-action. Tests corresponding 
to the first four of these contrasts were also 


TABLE 2 
Mean PERFORMANCE IN PAIRED-ASSOCIATE 
LEARNING As A FUNCTION OF POPULATIONS 
AND CONDITIONS 


Paired-associate 


task 
Condition Social class 

Triali | Trial 2 

Nouns Lower 3.86 | 6.00 
Middle 3.79 | 8.64 

Still Lower 7.64 | 13.93 
Middle | 8.86 | 13.07 

Nouns-still Lower 9.00 | 15.50 
Middle 10.50 | 18.14 

Sentence-still Lower 9.79 | 16.79 
Middle | 13.07 | 19.71 

Nouns-action Lower | 13.79 | 19.71 
Middle | 14.00 | 21.71 

Sentence-action Lower 15.50 | 21.14 
Middle | 13.57 | 20.64 

Mixed list Lower 11.15 | 16.54 
Middle | 11.08 | 18.15 
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made for the mixed-list design. These tests 
involved the formation of four new varia- 
bles representing the differences between 


item types and the performance of an ex- ` 


act test on each of these variables to de- 
termine if it differed significantly from zero 
(i.e., Hoiu = 0). i 
A summary of the results can be found in 
Table 3. For the six independent groups, 
wherever item type differences were found, 
they were found in both social class groups. 
Pictorial presentation was found to yield 
superior performance to verbal presentation 
(still > nouns), with the addition of labels 
to the pictures yielding even better per- 
formance (nouns-still > still). Increased 
prompting for visual elaboration was also 
found to improve performance (nouns-ac- 
tion > nouns-still). However, in contrast 
to previous findings (Rohwer et al., 1968), 
increased prompting for verbal elaboration 
did not significantly affect performance 
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Fic. 1. Mean number of correct responses on 
the paired-associate task as a function of popula- 
tions, conditions, and designs. (Upper panel: 
between-subjects design; lower panel: within- 
subjects design. Abbreviations: N = nouns; 8 = 
still; NS = nouns-still; SS = sentence-still; 
NA = nouns-action; SA = sentence-action.) 
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TABLE 3 
Conpit1on EFFECTS as A FUNCTION OF 
POPULATIONS AND DESIGNS 
_ | Between- | Within- 
CRM Social | subjects F | subjects F 
1/156) 1/24) 
Nouns vs. still Lower | 103.77**| .14 
Middle | 12.28**| 7.01* 
Nouns-still vs. still |Lower 32.67**| 24. 79** 
Middle 6.14**| 4.55* 
Nouns-action vs.  |Lower 19.74**| 3.45 
nouns-still Middle 6.81**| 13.79** 
Sentence-still vs. —|Lower .58 .00 
nouns-still Middle 2.34 7.47* 
Sentence-action vs. |Lower 1.34 
nouns-action Middle E 
*p < .05. 
ae E 


(sentence-still vs. nouns-still, sentence- 
action vs. nouns-action). 

For the mixed list, the item type differ- 
ences are not consistent across social class. 
For the middle-class subjects, all compari- 
sons made between item types were signifi- 
cant, including a significant effect due to 
prompting for verbal elaboration. For the 
middle-class children, still pictures were 
superior to nouns alone and the addition of 
nouns to still pictures yielded better per- 
formance than did the pictures alone. 
Both prompting for verbal elaboration and 
prompting for visual elaboration signifi- 
cantly improved performance. In contrast, 
the only significant effect found for the 
lower-class group under the mixed-list de- 
sign was for nouns-still versus still. As can 
be seen in Figure 1, the lower-class 
children's performance is virtually identical 
on the nouns-still, sentence-still, and 
nouns-action items. Performance on these 
items is far superior to that for either the 
nouns or the still items. 

The pattern of results suggests that the 
effects of the prompt conditions under the 
between- and within-subjects design are 
very similar for the middle-class subjects, 
but that this is not the case for the lower- 
class subjects. 


Interrelation of Paired-Associate and Ability 
Test Measures 


As was predicted, higher correlations of 
the ability test measures emerged for the 
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lower-class group than for the middle-class 
group. Perceptual speed is significantly cor- 
related with the two verbal measures for 
the lower-class sample (vocabulary, r = 
.39; sentence completion, r = .44, n = 97, 
p < 01); neither of these correlations was 
significant for the middle-class sample. The 
correlation of the two verbal measures is 
.56 for the lower-class sample and .48 for 
the middle-class sample (n = 97, p < .01). 

The correlations between the individual 
ability test and paired-associate learning 
for each prompting condition are interest- 
ing, although they must be interpreted 
with caution because of the small sample 
sizes on which they are based. For the pure 
lists for the lower-class sample, perceptual 
speed correlated significantly with Trial 1 
performance under the still condition (r — 
55, n = 14, p < .05). Significant correla- 
tions were also found for this sample be- 
tween both verbal ability measures and 
Trial 1 sentence-still performance (vocabu- 
lary, r = .59; sentence completion, r = .60, 
^ = 14, p < .05) and between perceptual 
Speed and Trial 1 of the sentence-action 
condition (r — .74, n — 14, p « .05). Of 
the significant correlations found for the 
middle-class sample, none of these involved 
perceptual speed. The significant correla- 
tions that were found were those of vocabu- 
lary with sentence-still performance on 
Trial 1 (r = 53, n = 14, p < .05) and of 
both verbal ability measures with sentence- 
action performance (Trial 1, sentence com- 
pletion, r = .60; Trial 2, vocabulary, r = 
-55; sentence completion, r = .62, n = 14, 
D X .05). It should be noted here that for 
the middle-class group, performance on the 
paired-associate task under the conditions 
involving prompting for verbal elaboration 
can be predicted from performance on the 
verbal ability measures. This relationship 
holds for the sentence-still but not for the 
sentence-action condition for the lower- 
class group. 

Very few significant correlations were 
found under the mixed-list condition. In 
the lower class, the significant correlations 
were perceptual speed with the still condi- 
tion, Trial 1 (r = .57, n = 18, p < .05); 
vocabulary with nouns, Trial 2 (r = 57, 
n = 18, p « .05); and sentence completion 


with sentence-still, Trial 2 (r — —.68, n 
= 13, p « .05). In the middle class, the 
significant correlations were perceptual 
speed with nouns, Trial 1 (r = —.62, n = 
13, p < .05) and vocabulary with nouns- 
action, Trial 1 (r = .59, n = 13, p < .05). 


Discussion 


The principal aim of this study was to 
compare a within-subjects (mixed-list) with 
a between-subjects (pure-list) design, using 
the same items and conditions in both. The 
question of concern was that of the gen- 
eralizability of results found using the 
mixed-list design. The general conclusion 
that can be reached is that these two de- 
signs yield similar results for the middle- 
class subjects but produce different out- 
comes for the lower-class subjects. 

-The effects of elaborative prompting in 
the between-subjects design showed the 
same pattern of results for both social class 
groups. Pictorial presentation yielded supe- 
rior performance to that found for verbal 
labels, but the addition of labels to the 
pictures yielded still better performance. 
Increased prompting for visual elaboration 
(action pictures) significantly improved per- 
formance. In contrast to previous findings 
(Rohwer et al., 1968), increased prompting 
for verbal elaboration (sentences) did not 
significantly improve performance. At the 
present time no adequate explanation is 
available for this difference in results. It is 
possible that the lack of difference between 
sentence-action and nouns-action is due 
to the fact that the children are operating 
at full capacity under nouns-action, so that 
there is no room for further improvement; 
however, this does not explain the lack of a 
significant difference in performance under 
the sentence-still and nouns-still conditions. 

Under the mixed-list design, the results 
for the middle-class children were the same 
as those found for the pure lists except for 
the addition of a significant effect due to 
prompting for verbal elaboration. For the 
lower-class children, however, the only 
comparison that yielded a significant effect 
was that of nouns-still with still. It ap- 
pears that for these children, with the 
mixed-list design, performance is approxi- 
mately equal under all conditions involving 
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elaborative prompting and that this per- 
formance far exceeds that found for either 
the simple use of nouns or still pictures. Tt 
appears as if these children are selecting 
for attention the items containing elabora- 
tive prompting. These items would prove 
much easier for the lower-class children if, 
as Rohwer (1968) suggests, they do not 
spontaneously employ elaboration but do 
utilize elaboration prompts. 

A second focus of the study was on the 
relationship between standard ability mea- 
sures and the paired-associate-learning task. 
It had been predicted that the pattern of 
correlations between the two sets of mea- 
sures would be dependent on task character- 
istics, specifically, that the largest correla- 
tions would be found between the verbal 
tests and verbal items and between the 
perceptual test and pictorial items. Some 
support for this hypothesis can be seen in 
the finding that for the independent groups, 
performance on the paired-associate task 
under conditions that involved prompting 
for verbal elaboration could be predicted 
best from performance on. the verbal ability 
measures. The other significant individual 
correlations that were found for the inde- 
pendent groups were also consistent with 
this hypothesis. 

It had further been predicted that the 
intercorrelations among the ability mea- 
sures and the correlations between ability 
measures and the paired-associate-learning 
task would be greater for the lower-class 
than for the middle-class children. In gen- 
eral, the results were consistent with this 
prediction. The intercorrelations of the 
ability test measures were higher for the 
lower-class than for the middle-class chil- 
dren, suggesting that for the former group 
these abilities might not be very differenti- 
ated. 

The overall social class differences found 
for the paired-associate tasks are congruent 
with previous findings. While both Semler 
and Iscoe (1963) and Rohwer et al. (1968) 
reported no social class differences, neither 
of these studies employed the particular 


condition that in the present study showed 
a significant social class effect, that is, the 
nouns condition. This result is consistent 
with that previously reported for third- 
grade samples by Rohwer et al. (1971), 
using a mixed-list design. 
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Incidental learning was analyzed in regard to attentional arousal 
and monetary incentive. Subjects scanned passages of a narrative 
for typographical and conceptual errors (intentional performance) 
but were tested additionally for the retention of various aspects of 
the story (incidental learning). Results indicated (a) specific instruc- 
tional directions increased incidental learning without necessarily 


decreasing intentional 


performance; 


(b) less specific instructional 


arousal interacted with the presence or absence of a monetary incen- 
tive, suggesting the deleterious effects of too much motivation 
and/or cognitive support upon the learning of irrelevant task ma- 
terial; (c) this interactive effect of incentive and arousal extended 
into the more traditional measures of recall and recognition of facts 


as well as the retention of temporal 


Sequences within the material. 


The area of incidental learning remains a 
highly differentiated collection of research, 
While this work has considered a varied list 
of individual and situational factors and 
their effects upon levels of incidental learn- 
ing, the proliferation of experimental de- 
signs and procedures (types of tasks, in- 
structions to subjects and operational 
definitions of variables) has made a system- 
atic evaluation of any subset of this re- 
Search extremely difficult. Additionally, 
research along these lines considering 
theoretical questions and materials of in- 
terest to the educator remains somewhat 
scant. The purpose of the present set of 
experiments was to examine the effects of 
monetary incentives and type of instruc- 
tions upon the level of incidental-type 
learning related to educational settings. 

While the primary focus of this study 
concerns the effects of incentives in in- 
cidental learning, the other manipulation 
in this experiment (type of instruction) 
was chosen because of its obvious significance 
and because of the variance in the reported 
research. In other research similar to the 
methodology of the present study (DuCette 


1 Requests for reprints should be sent to 
Stephen Wolk, Institute for Child Study, College 
of Education, University of Maryland, College 
Park, Maryland 20742. 


& Wolk, 1973), it has been argued that 
instructions increase incidental learning by 
heightening the cognitive prominence of 
stimuli without necessarily negating the 
incidental nature of these stimuli, a finding 
congruent with previous research. The 
reason for this effect remains to be deter- 
mined. 

Several earlier studies have suggested 
somewhat straightforward relationships be- 
tween incentives and intentional and in- 
cidental performance, such as that of 
Bahrick (1954) who found that monetary 
reward for intentional performance sig- 
nificantly increased this performance. How- 
ever, incidental learning decreased when 
such an incentive was present. Johnson and 
Thompson (1962) employing a similar 
design confirmed these findings. Zeffy and 
Bruning (1966) have also supported Bah- 
rick, using measures of anxiety as the motiva- 
tionally produced drive. 

ter research and writing in this area, 
thus, have come to distinguish two types 
of motivational effects possible in the typi- 
cal incidental-learning paradigm. Congruent 
with the work of Bahrick and of Johnson 
and Thompson, there is an incentive- 
oriented type of drive, acting to "funnel" 
attention in a directive manner, resulting 
in a reduction of attention to the irrelevant 
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or incidental material. Additionally, how- 
ever, a long list of investigations has con- 
sidered the effects of an emotionally-based 
drive, usually represented by an individual 
assessment or a situational inducement of 
anxiety. This type of drive is interpreted as 
producing a generalization of attention 
rom relevant to irrelevant aspects of à 
learning situation. Most notably the earlier 
work of Kausler, Trapp, and Brewer 
(1959), using the tasks of Bahrick (1954), 
has led to such a distinction. In this study 
wo experiments were reported in which 
anxiety was either assessed individually 
(Manifest Anxiety Scale) or induced by 
differential instruetions. In both experi- 
ments the high-drive group was superior to 
the low-drive group in intentional learning 
but did not differ in incidental learning, à 
finding contrary to that of Bahrick and his 
ollowers. 

A more extensive reading of the litera- 
ture, however, suggests that the plausibility 
of the distinction between incentive and 
emotionally produced motivational effects 
upon incidental learning may be in doubt. 
Duffy (1961), in an unpublished dissertation, 
found high-anxious subjects to remember 
significantly less of the incidental material, 
contrary to the purported generalized at- 
tention effect of such a drive, but similar to 
the funneling effect of an incentive-pro- 
duced drive. Miller and Dost (1964) and 
Stanton (1971), who employed the Bahrick 
intentional and incidental tasks, also found 
detrimental effects of such a drive upon 
incidental learning. Indeed, Dornbush (1965) 
cited evidence that the presence of incen- 
tives for intentional performance may not 
result in a lower level of incidental learning, 
as Bahrick (1954) argued. These studies, 
contrasted with earlier ones, suggest several 
points: 

1. The effects of an emotional drive upon 
incidental learning may be very similar to 
those of an incentive drive (ie., funneling 
of attention to relevant task parameters), 
suggesting a single motivational effect upon 
such learning; additionally, heightening of 
the motivational state (as in the Stanton, 
1971, study) may actually suppress both the 
levels of intentional and incidental learning. 

2. The reliance on purely motivational 


interpretations of reduced performance 
under increased drive may be too simplistic. 
It is possible that increased activity under 
high-drive states may be channeled into 
more attention to the incidental material. 
This implies that heightened motivation 
not only changes the intensity of the sub- 
ject’s behavior but also his strategy and 
direction. 

It would appear, congruent with the 
above points, that the literature has yet to 
delineate adequately the relationship be- 
tween motivational states, specifically’ ma- 
terial incentives and incidental learning. 
Moreover, almost total disregard has been 
paid to an experimental task in which 
stimuli are interrelated as opposed to the 
discrete tasks discussed previously. To what 
degree the incidental material is intrinsic to 
the intentional task would appear to play 
an important role in determining levels of 
incidental learning as well as the effect of 
any motivational state. Being extrinsic to 
the main task, additionally, the incidental 
material may lose its “incidental nature" 
from a type of isolation effect. Incidental- 
learning research and those studies dealing 
with incentives have also offered little in 
the way of generalizability to instructional 
settings. With these points for considera- 
tion, two experiments were designed to 
determine the effects of monetary incentive 
upon incidental learning in conjunetion with 
differential instructions to orient attention. 


EXPERIMENT 1 


Method 


Subjects. The subjects were 60 students enrolled 
in several sections of an intermediate level educa- 
tional psychology course during a summer instruc- 
tional program (males = 27; females = 33). 
Neither course requirements nor grades were 
contingent upon participation in the experiment. 

Materials. An intentional-incidental learning 
task, consisting of connected stimuli (textual 
material), the basis for several previous investiga- 
tions (DuCette & Wolk, 1973; Wolk & DuCette, 
1974), was employed. The intentional task had 
the subjects read a story about the discovery of & 
fictitious drug (written by the experimenters). The 
story was composed of four paragraphs of seven 
sentences each with approximately 200 words per 
paragraph. Subjects were given two minutes to 
read each paragraph. The general intentional 
instruetions were that the subjects were to look 
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for and circle errors in the paragraphs “as if they 
were proofreaders." 'These errors were of two 
kinds: obvious misspellings, run-on words, dupli- 
cated letters, etc. and more conceptual errors 
(inappropriate words, subjects not agreeing with 
verbs, non sequiturs, etc.). There were 20 typo- 
graphical and conceptual errors in each paragraph. 

The measure of incidental learning was the 
retention of specific examples of the following 
categories of items presented in the Story: num- 
bers, colors, countries and cities, individual's 
names, animals, occupations, and parts of the 
body. A total of 36 examples of these categories 
were dispersed by having 9 present in each para- 
graph. Following the intentional reading of the 
Story, subjects were presented first with a grid 
having the seven categories of elements listed. 
Instructions said to list as many of the examples 
of these categories as appeared in the story (recall 
measure). Subjects were allotted six minutes. 
Following this, subjects were presented with a list 
of the seven categories of elements under each of 
which were 15 examples. Among these were the 
actual words that had appeared in the story. For 
each category, the subject was instructed as to 
how many of the examples had actually appeared 
(this was either five or six) and was told to circle 
only that number for his choices but no more 
(recognition measure). The subject was to guess 
if necessary but to make sure he circled the proper 
number, Six minutes were also given. 

Procedure and design. Subjects were randomly 
assigned to one of two groups: an incidental group, 
to which only the intentional task directions were 
given but for which retention of the categories 
was totally incidental, and an intentional group, 
which was given, in addition to the intentional 
task directions, an explicit statement concerning 
the nature of the incidental-learning test they 
would be given following the task. (This included 
the actual list of the seven categories of elements 
to be tested.) Each subject was tested in a Session 
that lasted approximately 25 minutes. 


Results 


A one-way analysis was computed on 
three dependent variables: intentional task 
performance, that is, number of errors 
correctly identified (total Possible score 
= 80); incidental recall of examples (total 
Possible score = 36); and incidental recogni- 
tion of examples (total possible score = 36). 

Presented in Table 1 are the means and 
standard deviations of the three dependent 
variables. For intentional performance 
(searching for errors) there was no signifi- 
cant difference between groups (F = 1.38, 
df = 1/58). The groups did differ, however, 
on both measures of incidental learning: 
recall (F = 16.22, df = 1/58, p < -05) and 


TABLE 1 


MEANS AND STANDARD DEVIATIONS C 
VARIABLES, EXPERIMI 


` DEPENDENT 
Tl 


Incidental learning. 


Group tnim 
Recognition 
Intentional 
56.80 10.60 24.37 
SD 8.93 3.50 2.69 
Incidental 
59.71 21.96 
SD 9.90 3.10 


recognition (F = 4.06, df = 1/58, p < .05). 

Implieations of this will be considered in 
conjunction with the results of Experiment | 
2. Equally important, this experiment - 
served to establish baseline rates of both 


intentional and incidental learning under 
two instructional conditions, a considera- 
tion seldom made in most incidental-learn- 
ing studies. 


EXPERIMENT 2 


| 
Method 


Subjects. Subjects in this experiment were 88 
students drawn from several sections of the same 
course as that in Experiment 1 (males — 40, 
females — 48). However, this occurred in a semes- 
ter of the summer session subsequent to that of 
Experiment 1. 

Materials. Both the intentional and incidental 
tasks were the same as those in Experiment 1, 
as were the measures of intentional performance, 
incidental recall, and recognition. However, in 
order to acquire a broader base of evaluation for 
the manipulations in this experiment, the follow- 
ing incidental-learning measures were added: à 
multiple-choice test composed of 14 items assess- 
ing retention of factual information about the 
Story. Each stem of each question had five alterna- 
tive choices from which the subject was to circle 
the correct one (total possible score — 14). This 
followed the recognition test. Lastly, subjects 
were given a temporal Sequencing test composed | 
of three sets of six events having occurred in the 
Story. The subjects’ task was to place these in the 
temporal order in which they had occurred in the 
Story by numbering them from 1 to 6 (a = first qi 
to have occurred). This test was scored by award- 
ing, for each of the three sets, 2 points for each | 
fact that had been correctly ranked and 1 point |) 
if the fact had been ranked only one step removed 
from its actual rank (total possible score = 36). 

Procedure and design. The main experimental 
manipulation consisted of the use of monetary, 
incentives for intentional task performance. In & 
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TABLE 2 


MEANS AND STANDARD DEVIATIONS OF DEPENDENT 


VARIABLES BY EXPERIMENTAL GROUPS, 
EXPERIMENT 2 


High attentional arousal | Low attentional arousal 


Measure 


fac tine Incentive gab Incentive 
Intentional performance 
Dd 57.00 58.27 55.40 61.42 
SD 10.71 10.54 10.10 8.30 
Incidental recall 
x 9.43 | 6.68 | .6.05 | 10.00 
SD 3.94 2.41 3.39 3.06 
Incidental recognition 
x 24.23 23.04 22.30 25.18 
SD 2.91 2.41 2.99 2.75 


Multiple choice 


5.18 5.00 6.64 
1.73 2.12 1.83 


Temporal sequencing 


18.81 
3.81 


15.10 
4,93 


16.63 
4.97 


X 18.47 
SD 4.14 


previous semester, students similar to the subjects 
in the current experiment were questioned about 
various types of payoffs in experiments. They 
were asked to judge whether individual incentives 
for number of errors found ($.03 per item) were 
less motivating than several large payoffs for 
the highest performers on the task. The latter 
alternative was judged more effective. "Therefore, 
in the present study half of the subjects were 
randomly assigned to an incentive condition. 
When they reported for testing they were told 
that the three highest performers on the inten- 
tional (proofreading) task would receive prizes of 
$50.00, $30.00, and $10.00. Prizes were awarded 
at the termination of the study. 

An additional manipulation consisted of atten- 
tional arousal for the incidental material through 
instructions. Half the subjects were randomly 
assigned to a condition in which no information 
concerning the incidental material was given, 
corresponding to the incidental group of Experi- 
ment 1. The remaining subjects were informed 
that their knowledge of “some aspects of the 
story" would be questioned, instructions more 
intermediate than those of the intentional group 
of Experiment 1. Thus, the influences of both 


material incentives and instructions, as well as 
the interactive influences of both, were examined 
for both the intentional task and the various 
measures of incidental learning. Subjects were 
tested in a 40-minute session. 


Results 


The statistical design was a 2 X 2 analysis 
of variance with independent groups. Table 
2 presents the means and standard devia- 
tions of all dependent variables by experi- 
mental groups. Results of the analyses of 
variance revealed that there were no sig- 
nificant main effects or interactions for 
intentional performance. For all measures 
of incidental learning, however, the inter- 
action term was always significant, although 
again there were no main effects. These 
interactions were obtained for the variables 
of recall (F = 22.69, df = 1/84, p < 001), 
recognition (F = 11.46, df = 1/84, p < 
.001), multiple choice (F — 1044, df — 
1/84, p < 01), and temporal sequencing 
(F = 12.39, df = 1/84, p < .001). 

As a further explication of the significant, 
interaction effects for all measures of in- 
cidental learning, comparisons among means 
were made using the Tukey test for dif- 
ferences among means (Kirk, 1968). These 
are summarized as follows: 

1. There were no significant differences 
for all variables between the high-atten- 
tional-arousal-incentive and — low-atten- 
tional-arousal-no-incentive conditions, nor 
between the high-attentional-arousal-no- 
incentive and low-attentional-arousal-incen- 
tive conditions. 

2. Under the incentive condition, high 
attentional arousal produced significantly 
poorer incidental learning than did low 
attentional arousal for recall (p « 01), 
recognition (p < .01), multiple choice (p < 
.05), and temporal sequencing (p < .05). 

3. Under the  no-incentive condition, 
high attentional arousal produced sig- 
nificantly higher incidental learning than 
did low attentional arousal for recall (p < 
.01), recognition (p < .01), multiple choice 
(p « .05), and temporal sequencing (p € 
.01). 


Discussion 


While it is somewhat surprising that 
neither attentional arousal nor incentives 
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had a main effect on cover task performance, 
it is even more interesting that there was no 
interaction between the two. It will be 
argued that the lack of these expected 
effects is not due to either manipulation’s 
essential irrelevance but to the fact that 
cover task performance was, for all practical 
purposes, at asymptote for all subjects 
regardless of condition. This does not mean 
that this specific nonlearning, orienting task 
under the conditions used in this experi- 
ment is cognitively taxing, for we will 
argue later that it is not, but only that the 
particular conditions under which the task 
was given did not allow variance in per- 
formance for these particular subjects. Both 
experiments support this (over-all mean 
intentional performance for Experiment 1 = 
58.25; for Experiment 2 = 58.02). 

Since the pattern for all of the measures 
of incidental learning was the same, they 
are discussed together. Based on previous 
research it was possible to hypothesize the 
effects of both types of manipulations. In 
reference to instructions, it was logical to 
assume that giving subjects orienting in- 
structions for the incidental material would 
enhance performance. This was strongly 
demonstrated in Experiment 1 in which the 
incidental learning was essentially in- 
tentional, but not in Experiment 2. One 
possible theoretical position concerning in- 
centives, based on the work of Kausler et al. 
(1959) and Stanton (1971), predicts a 
facilitating effect of motivation due to a 
generalized attentional effect from the in- 
tentional to the incidental task. The other 
position (Bahrick, 1954; Johnson & Thomp- 
son, 1962) predicts a decrease in incidental 
learning due to the funneling effect on 
attention. It is evident that neither position 
is fully supported by the data. While it is 
true that either cue explication or incentives 
significantly improve performance over the 
baseline (the low-attentional-arousal-no-in- 
centive condition), the two in combination 
seem to cancel each other out. There are 
two explanations of this interaction that 
seem plausible. 

The first possibility that is explored as- 
sumes that the two manipulations used in 
the experiment involved two different proc- 
esses—one cognitive (instructions) and 


one motivational (incentives). Either of 
these processes improves incidental learning, 
but for different reasons: Instructions 
organize and structure the input; incentives 
increase the attention paid to all aspects 
of the tasks. This could explain the equality 
of the high-attentional-arousal-no-incentive 
and low-attentional-arousal-incentive con- 
ditions. 'This position, however, seems 
incapable of explaining the low level of 
performance in the high-attentional-arousal- 
incentive condition (not different from the 
low-attentional-arousal-no-incentive condi- 
tion). It would, in fact, predict that the 


condition most facilitating for incidental 
learning would be the condition in which 
both instructions and incentives were 


present. The data show that this prediction 
is not upheld, bringing such a two-process 
model into question, at least for this ex- 
periment. 

The second possibility assumes that both 
cue explication and incentives are motiva- 
tional manipulations. From this it follows 
that the reason for the poor performance in 
the high-attentional-arousal-incentive con- 
dition is due to the subjects being over 
motivated, an example of the Yerkes-Dod- 
son law (Broadhurst, 1957). This explana- 
tion is congruent with previous theoretical 
interpretations of incidental learning. For 
example, Ryan (1970), in his review of in- 
cidental learning, points out that intentional 
instructions to learn (which are analogous 
in this experiment to the high-attentional- 
arousal condition) are often interpreted as 
being a motivational manipulation. If this 
is correct, then subjects in the high-atten- 
tional-arousal-incentive condition had two 
Sources of motivation acting upon them. 
Since the incidental task was extremely 
difficult, the debilitating effect of this 
heightened motivation is exactly as de- 
scribed in previous research (Spence & 
Taylor, 1951). In addition, this explanation 
Is as capable as the first in explaining the 
heightened performance in the two alternate 
cells. In both cases, the subjects were more 
motivated than the subjects in the low- 
attentional-arousal-no-incentive condition 
and thereby performing at greater levels. 
Of the two explanations, therefore, the 
interpretation of the results from a purely 
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motivational perspective seems most, power- 
ful. 

The data from this experiment would 
seem to indicate that both the generalizing 
and funneling effects of motivational arousal 
on attention in incidental learning may be 
applicable, but in different situations. In 
the absence of other sources of motivation 
and/or cognitive supports, motivation may 
serve to generalize attention. With too 
much motivation or the presence of cogni- 
tive supports that place too much stress on 
the organism, however, attention becomes 
selective and narrowly focused. Further re- 
search in this area must be more precise than 
previous research in specifying the nature of 
the task used and the level of motivation 
employed. Since incidental learning is of 
such significance in all aspects of education, 
this point cannot be overlooked. A teacher 
may, in fact, be decreasing the amount of 
nondirected material a student learns by 
jointly manipulating a set of conditions 
under which the material is presented. 
When these factors and the interactions 
between them are delineated, the dynamics 
of incidental learning can be better specified. 
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A3 X 2 X 2 factorial design 
was used to study the short- 


with 108 college undergraduate subjects 
and long-term retention effects of view- 


ing testlike factual questions in conjunction with reading a written 
instructional passage. A simple change in the position of the inter- 
spersed questions produced different mean postreading retention 
performances. Subjects in questioned groups retained significantly 


(a = .05) more 


question-relevant information. than a nonquestioned 


control group on both an immediate and seven-day delayed reten- 


tion measure. The 
incidental content than 
on both retention tests. 


postquestioned group retained more question- 
either the prequestioned or control groups 


The main purpose of this study was to 
investigate the effects of interspersed pre- 
and postquestions on the delayed retention 
of question-relevant and question-incidental 
prose material. With the exception of a few 
studies (Natkin & Stahler, 1969; Peeck, 
1970), the focus of investigation has been 
on immediate retention measures obtained 
Shortly after the subjects have read and 
studied the stimulus material (e.g., Brun- 
ing, 1908; Frase, 1907, 1968; Rothkopf, 
1965, 1970). Although this type of short- 
term retention may at times be a pertinent 
educational objective, the goals of educa- 
tion more commonly involve long-term re- 
tention of previously learned material. If 
the delayed retention effects of interspersed 
questions could be demonstrated to be 
similar to those found in studies that used 
only immediate retention Measures, im- 
portant practical and theoretical implica- 


tions for certain instructional processes 
could result. 


* Requests for reprints should be sent to John 
R. Boker, Department of Educational Psychology, 
Pennsylvania State University, 201 Social Sciences 
Building, University Park, Pennsylvania 16802. 
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Meruop 

Subjects 

The subjects were 108 undergraduate students 
enrolled in the introductory educational psychol- 
ogy course at Pennsylvania State University. 
Extra credit toward the course grade was given 
to every subject for participation in the study. 
To control for prior background knowledge, 
students who majored in fields concerned with 
biology and historical geology were not permitted 
to participate in the study. 


Materials 


Passage. A prose passage consisting of 10 
Sections of 250 words each was prepared from 
college level textbooks. Each Separate section 
was one of a series of topically related but rela- 
tively independent factual Segments covering 
Selected topics in historical geology. The topics 
were chosen on the basis of being generally un- 
familiar to most subjects who participated in the 
Study, and this insured that all subjects started 
from the same approximate baseline in learning 
the prose material. 

lions. Four multiple-choice items, each 
with four alternatives, were constructed for each 
9f the 10 sections. The 40 items tested relatively 
independent factual bits of information that were 
evenly spaced throughout each section. Two items 
for each section were selected randomly and 
designated as interspersed questions. These 20 
items constituted a criterion posttest to measure 
the retention of question-relevant content. The 
Temaining 20 items appeared in’ the posttest as 
the measure of retention of question-incidental 
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content. Preliminary testing indicated that both 
sets of questions were of equal difficulty. 

Relention lests. An objective, immediate reten- 
tion test was constructed by arranging the 40 
multiple-choice items in random sequence. Re- 
liability of the test (Kuder-Richardson Formula 
20) was .75. A delayed retention test was con- 
structed by rearranging the 40 items in a new 
random sequence. The test-retest correlation 
between both total test scores was .80 when com- 
puted on only the control group data over the 
seven-day retention interval. 


Design 


The design of the study implied a 3X2X2 
analysis of variance with repeated measures on 
the last two factors. The first factor was called 
treatments and represented the manipulation of 
the position of the interspersed questions within 
the reading passage. The prequestioned group 
viewed each two-item set of interspersed questions 
prior to reading the prose section to which the 
questions related. The postquestioned group 
viewed the interspersed questions after reading 
the related section. The control group simply 
read through all 10 sections without viewing any 
interspersed questions. 

The two repeated measures factors were test 
time and retention type. The test time factor 
represented the immediate and delayed retention 
tests; whereas retention type referred to either 
question-relevant or question-incidental reten- 
tion. 


Procedure 


The subjects were randomly assigned to one of 
the three experimental treatments for the first 
part of the study. Each subject was given an 
envelope containing the stimulus materials and 
was cautioned by the experimenter not to remove 
the materials from the envelope until instructed 
to do so. The materials took the form of an un- 
bound stack of 814 X 11 inch mimeographed 
pages on which the sections of the passage and 
the interspersed questions (for prequestioned and 
postquestioned conditions) were reproduced. 
Each mimeographed page contained either 1 of 
the 10 prose sections Or, when required by the 
experimental condition, the two interspersed 
questions that related to a specific prose section. 

The subjects were instructed to (a) work 
through the materials at their own rate, (b) read 
each page in the sequence in which it appeared, 
(c) place each page back into the envelope after 
they had finished reading it, and (d) not go back 
to a page once it was placed back into the en- 
velope. Subjects in the prequestioned and post- 
questioned groups received additional instruc- 
tions to answer directly on the page any questions 
that were encountered during the reading of the 
passage. No knowledge of results was provided 
for the interspersed questions, however. These 
two groups were also instructed to follow the 


TABLE 1 


MEANS AND STANDARD DEVIATIONS OF RETENTION 
Trst Scores ron Hach TREATMENT GROUP 


Quee | Quim | Ted 
"Treatment 
M SD M SD M | SD 
Immediate retention 
Prequestions  |15.19| 2.23 12.22| 2.23|27.413.75 
Postquestions  |16.00| 2.26 14.92| 2.29/30.92 3.86 
Control 13.17| 3.27/13.50| 3.04/26.73/5.06 
Delayed retention 

Prequestions 13.78| 2.3910.94| 1.97/24.72)3,71 
Postquestions |15.17| 2.57/13.92 2.21/29.094.02 
Control 12.53, 3.03/11.47| 2.46/24.004.70 


Note. n = 36 for each treatment group. 


same procedure as they did with the prose section 
pages when each page that contained questions 
was completed. No note taking was allowed, and 
the subjects were told to expect a test upon com- 
pletion of their reading. 

When each subject completed his reading, he 
answered a short questionnaire that required 
some personal data and then began the immediate 
retention test. The subjects received the delayed 
retention test one week later. Both tests were 
intended to be power tests, therefore no time 
limit was imposed upon the subjects. 


REsULTS AND DISCUSSION 


The mean immediate and delayed re- 
tention scores for each treatment group are 
shown in Table 1. The overall analysis of 
variance performed with respect to these 
means indicated a significant test time 
effect (F = 63.69, df = 1/105, p € .001). 
The decrease in retention means of 1.21 
points (about 10%) over the one-week 
retention interval was expeeted because 
subjects were not permitted to review the 
passage at any time. However, none of the 
interactions involving the test time factor 
were significant. The absence of test time 
interactions indicated that the same trends 
were found for immediate and delayed re- 
tention measures. 

The overall analysis of variance also 
revealed a significant Treatments x Re- 
tention Type interaction (F = 13.37, df = 
2/105, p < .001). Analysis of the question- 
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relevant retention means with Tukey’s 
WSD test (œ = .05) revealed that both the 
postquestion and prequestion means were 
significantly larger than the control mean, 
but they were not significantly different 
from each other. The means were 15.58, 
14.49, and 12.85 for postquestioned, pre- 
questioned, and control groups, respec- 
tively. The results extended previous find- 
ings that viewing interspersed pre- or post- 
questions during the reading of instructional 
passages facilitated the retention of factual 
content related to those questions to ap- 
proximately the same degree. The present 
data show that this effect holds for either 
immediate or delayed retention measures. 
A separate analysis of the question-inci- 
dental retention means (collapsed over the 
time factor) confirmed the previously 
established general superiority of the post- 
questioned group. Using the Tukey WSD 
test (œ = .05), the results indicated that the 
prequestion and control means were not 
significantly different from each other, but 
both were significantly lower than the post- 
question mean. Means of 14.42, 11.58, and 
12.51 were obtained for postquestioned, 
prequestioned, and control groups, re- 
spectively. Again, the results substantiated 
previous findings that only interspersed 
Postquestions facilitate the retention of 
question-incidental factual content from 
written instructional passages. Preques- 
tioning actually depressed question-inci- 
dental retention scores below those of the 
nonquestioned control group (although this 
difference was not significant). The present 
data extends previous findings in that the 
facilitative effects of postquestions on re- 
tention of incidental content were rela- 


tively constant over the seven days. The 
present results confirmed the overall (rele- 
vant plus incidental) superiority of inter- 
spersed postquestions in that postquestions 
facilitated the retention of question-relevant 
content equally as well as did prequestions, 
but the postquestions were markedly su- 
perior in their question-incidental retention 
effects. The mean retention differences ob- 
served in the present study give support to 
Rothkopf's (1970) mathemagenic hypothe- 
Sis. However, if the subjects could review 
the text, skipping back and forth, then even 
postquestions might depress incidental learn- 
ing from written instructional passages. 
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Two levels of mental abilities were hypothesized to interact with 
socioeconomic status and/or race such that (a) socioeconomic-status 


differences were greater for 
(b) the correlation between Levels I 


I upon Level II were greater in upper- th 
status populations. These hypotheses were borne 
Level I measures (digit-span memory) and Level 
“Thorndike Intelligence Tests, verbal and non- 


data, consisting of 
II measures (Lorge 


Level II than for Level I abilities and 


and II and the regression of Level 
an in lower-socioeconomic- 
out by the present 


verbal) obtained on all white and black pupils in Grades 4-6 in one 


school district. The largest effects 


between the white population an! 
black group. 


were attributable to differences 


d the low-socioeconomic-status- 


The present study tests J ensen’s Level I- 
Level II theory of mental abilities in a 
total school population. The theory has been 
tested heretofore only with specially selected 
samples from the population. 

The theory and related evidence have 
been presented in detail elsewhere (Jensen, 
1968, 1969, pp. 109-117, 1970a, 1970b, 1973, 
pp. 193-293; Jensen & Rohwer, 1968). 
Briefly, the theory involves two types of 
mental abilities, Level I and Level II, and 
their interaction with population (socioeco- 
nomic status and/or race) differences. Level 
I ability consists of rote learning and pri- 
mary memory; it is the capacity to register 
and retrieve information with fidelity and is 
characterized essentially by a relative lack of 
transformation, conceptual coding, or other 
mental manipulation intervening between 
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information input and output. Level II 
ability, in contrast, is characterized by 
mental manipulation of inputs, conceptuali- 
zation, reasoning, and problem-solving; it is 
essentially the general intelligence (g) factor 
common to most complex tests of mental 
ability and standard tests of intelligence. 
Level I abilities are best measured by rote- 
learning tasks: serial learning, repeated 
trials of free recall of a number of succes- 
sively presented familiar uncategorized 
objects, pictures, or nouns, and tests of 
short-term memory, such as digit span. 
Level II ability is best measured by tests of 
general intelligence that have a high general 
intelligence loading and especially those of 
the nonverbal, fluid-intelligence, culture-fair 
variety. 

An interesting point about Level I and 
Level II abilities is their interaction with 
socioeconomic status and race, as has been 
shown in the articles cited previously. The 
first studies showed mainly that in groups of 
children selected for low Level II ability 
(IQs of 60-80), the low-socioeconomic- 
status children (white or black, although 
socioeconomic status and race are con- 
founded in some studies) obtain markedly 
higher scores on Level I tests (usually ap- 
proaching children with average IQs of 90- 
110) than are obtained by the middle- or 
upper-socioeconomic-status children with. 
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the same low IQs. On Level I tasks, middle- 
socioeconomic-status children with low IQs 
perform more in accord with their low IQ, 
while low-socioeconomie-status children per- 
form more like children of average IQ. This 
finding suggests a lower correlation between 
Level I and Level II ability of low-socioeco- 
nomie-status than in middle-socioeconomic- 
status populations. Also, it means that, in 
general, groups differing in socioeconomic 
status should differ less in Level I ability 
than in Level II ability. Thus it was sug- 
gested that if Level I ability could be made 
more important in the educative process, 
there might be a chance of diminishing the 
present large differences in scholastic per- 
formance associated with socioeconomic 
status and racial group differences in Level 
II ability, which is known to correlate highly 
with scholastic achievement in the prevailing 
system of education. 

The earlier studies were based on a 2 x2 
analysis of variance design: high (or middle) 
socioeconomic status versus low socioeco- 
nomic status and high IQ (100-120) versus 
low IQ (60-80) thus forming four groups. 
Typically there were equal numbers of 
subjects (20-40) in each of the four groups. 
The low-IQ groups were often selected from 
classes for the educable mentally retarded 
with average IQs slightly below 70 (since 
75 is the cutoff for admission to educable 
mentally retarded classes in California public 
schools). Because of the difficulty of match- 
ing low- and high-socioeconomic-status 
groups for high IQ, the “high-IQ” groups 
were usually only slightly above average 
(i.e, IQ of 105-110). The socioeconomic- 
status difference in Level I (learning) ability 
for the low-IQ Broups was always highly 
significant, but the low- and high-socioeco- 
nomic-status groups of high IQ (i.e., about 
an IQ of 105-110) usually did not differ 
Significantly, although the crossover or dis- 
ordinal type of interaction usually appeared. 

The present Level I-Level II hypotheses 
can be stated in their simplest form as fol- 
lows: 

1. Social classes do not differ, on the 
average, in Level I ability, but differ on 
Level II ability. (Another way of stating this 
is that Level I ability is not correlated with 
socioeconomic status and Level II ability is 
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positively correlated with socioeconomic 
status.) 

2. The regression of Level I upon Level 
II ability is greater (i.e., steeper slope of the 
regression line) in upper- and middle-socio- 
economic-status populations than in low- 
socioeconomic-status populations. A less 
general corollary of this is that the correla- 
tion between Level I and Level II is greater 
in upper- and middle-socioeconomic-status 
populations than in  low-socioeconomic- 
status populations. It is less general because 
restriction of the range of talent can affect 
the size of the correlation coefficient; whereas 
the slope of the regression line remains the 
same even if the distribution on one or both 
of the variables is truncated and the variance 
is thereby reduced. The correlation is 
lowered, therefore, but the slope of the 
regression line remains unchanged. The 
slope of the regression line (of Level I on 
Level II), therefore, is a more stable and 
fundamental datum. Thus, a proper test of 
the hypothesis should involve testing the 
difference in the regression of Level I upon 
Level II in low- and middle-socioeconomic- 
status groups. 

The regression of Level II upon Level I 
has not been a part of the theory and cannot 
be inferred from the theory unless certain 
assumptions are made, assumptions for 
which at this point there seems to be no real 
theoretical basis. The lines of regression of 
Level II upon Level I can be determined only 
if we assume a precise value of the correla- 
tions between Levels I and II in low- and 
middle-socioeconomic-status groups. The 
theory posits no precise values, for no specific 
value exists for the general case. The correla- 
tions are merely population parameters, 
which may vary according to the popula- 
tions sampled and the method of classifying 
individuals by socioeconomic-status group. 
The theory only posits that the regression 
coefficient (i.e. slope) of Level I on Level II 
Is greater (ie. steeper slope) in middle- 
Socloeconomic-status than in low-socioeco- 
nomie-status populations. The posited dif- 
ference thus is directional rather than pre- 
cisely quantitative. If the variances are 
assumed to be the same in the middle- and 
low-socioeconomic-status groups (an as- 
sumption that is independent of the theory), 
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then the correlation between Levels I and II 
will be greater in the middle- than in the low- 
socioeconomic-status group. Assuming this 
to be the case, then, the regression lines (of 
Level II upon Level I) should gradually 
converge. But the convergence would be 
very gradual, and assuming realistic values 
of the Level I-Level II correlations in the 
low- and middle-socioeconomic-status groups, 
there would be no point within +3 sigmas 
of the mean of a normal distribution (which 
includes 99.74% of the population) of Level 
I scores at which low- and middle-socioeco- 
nomic-status groups matched on Level I 
ability would be equal in Level IT ability or 
where the low-socioeconomic-status group 
would exceed the middle-socioeconomic- 
status group in Level II ability. If the means 
of the low- and middle-socioeconomic-status 
groups were assumed to differ by 1 sigma 
and if the Level I-Level II correlations 
were .6 and .4 in the middle- and low-socio- 
economic-status groups, respectively, we 
would have to match subjects from the two 
socioeconomic-status groups for Level I 
scores at least 5 sigmas below the common 
Level I mean in order for the low-socioeco- 
nomic-status subjects, on average, to equal 
or exceed the middle-socioeconomic-status 
subjects on Level II ability. But any subjects 
who were 5 sigmas below the mean on Level 
I ability would be in the range of severe 
mental defect, at the imbecile or idiot level, 
where the deficit is more likely due to a ma- 
jor gene or chromosomal anomaly or to or- 
ganic damage, rather than to the normal var- 
iations in the polygenic and environmental 
determinants of mental variation that oper- 
ate in the bulk of the population. For most 
of the normal population, the regression 
lines of Level II upon Level I for;the two 
socioeconomic-status groups would be prac- 
tically parallel. Estimating the point of con- 
vergence at 5 sigmas below the mean as- 
sumes linearity of regression all the way 
down into the range of severest mental de- 
fect, and since the causal factors in that range 
are different than for the rest of the dis- 
tribution, such an assumption is quite un- 
warranted. Within reasonable boundary con- 
ditions for the operation of the theory, the 
lines of regression of Level II upon Level 
I should be pictured as almost parallel, with 
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such slight convergence that the lines would 
not come together within the range of abil- 
ities normally found in the public schools. 
These hypotheses are depicted graphically 
in Figure 1. 

As can be seen, the angles between the 
regression lines are different for the lower- 
and  middle-socioeconomic-status groups 
(angles land m), being smaller for the middle 
socioeconomic status. (The cosine of the 
angle between the regression lines is equal to 
the correlation coefficient when sigma is the 
same on both variables or the scores are 
standardized with the same sigma.) 

3. The third element of the theory con- 
cerns the hierarchical relationship between 
Level II and Level I ability. The develop- 
ment of Level II ability, as well as Level II 
performance itself, is seen as having some 
functional dependence upon Level I ability, 
but the reverse is not true. For example, 
initial learning of the information and cogni- 
tive skills involved in Level II performance 
may depend in part upon short-term memory 
and its consolidation, which are Level I 
processes. Thus an individual with superior 
Level I ability will in the long run show 
better Level II performance than will a 
person with the same genetic and environ- 
mental potential for Level II ability but with 


High 


Level I Test Scores 
x! 
$ 


Low 
Low X. XE High 
Level II Test Scores 
Ficure 1. Hypothetical regression lines for 
relationship between Level I and Level II abilities 
in middle- and lower-class populations (angles 
l and m). 
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poorer Level I ability. Also, it seems reason- 
able to suppose that some short-term 
memory can be involved in solving Level II 
problems, such as Raven matrices items or 
the mental arithmetic subtest of the 
Wechsler scale, in which information must 
be retained in memory (i.e., Level I) while 
mental operations are being performed on it 
(i.e., Level II). A relatively pure Level I test, 
such as digit-span memory, on the other 
hand, can hardly be seen as depending upon 
the processes of. abstraction, generalization, 
and conceptualization that are called for in 
Level II tests. 

Another way of stating the hierarchical 
relationship between Levels I and II, that is, 
the functional dependence of Level II upon 
Level I ability, is to say that Level I is neces- 
sary but not sufficient for the development 
and operation of Level II ability. A conse- 
quence of this hierarchical formulation 
would be that one would seldom if ever find 
individuals with very high Level II ability 
who have very low Level I ability. The re- 
verse, however, would not be uncommon, 
that is, persons with high Level I ability but 
low Level II ability. (In fact, quite extreme 
idiot savants of this type are known to 
exist.) As Matarazzo (1972, p. 204) has noted 
in eonnection with the clinical usé of the 
Wechsler intelligence test, a low score on 
memory span for digits is highly related to 
general mental retardation, while a high 
Score on digit span is not highly indicative of 
Superior general intelligence. Matarazzo 
states, 


Ordinarily, an adult who cannot repeat at 
least four or five digits forward is lin about 9 
cases out of 10] either organieally impaired or 
mentally retarded. Nevertheless, mental retard- 


ates sometimes do well on the Memory Span Test 
[p. 205]. 


"This observation Suggests a hierarchical (or 
necessary-but-not-sufficient) type of rela- 
tionship between Levels I and II. 

; 1f this is in fact the case, the dispersion 
(i.e., the standard error of estimate) of Level 
I scores about the line of regression of Level 
I upon Level II should Show a gradual and 
regular decrease in going from lower to 
higher scores on Level II. Thus the relative 
magnitudes of the standard error of estimate 
of Level I scores for low and high scores on 
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Level II provide a test of the hypothesis of 
hierarchical dependence of Level II upon 
Level I. That is, if the hypothesis is true, we 
should expect to find a larger dispersion of 
Level II scores in the lower range of Level I 
scores than in the higher range of Level I 
Scores. 

The purpose of the present study was to 
test each of these three main hypotheses 
(described under 1, 2, and 3) derived from 
the Level I-Level II theory of mental abil- 
ities. 


Meruop 


Subjects 


The 2,612 subjects in this study consisted of 
virtually all the white (n = 1,489) and black (n = 
1,123) children enrolled in regular classes of the 
fourth, fifth, and sixth grades from all 14 elemen- 
tary schools of the Berkeley Unified School Dis- 
trict in California. 

The small percentage of children who were 
absent on the particular day that their class was 
tested are not included in this study. Also, test 
data on all children not classified in the school 
records (and according to their own parent (s)) as 
either white or black were excluded from the 
present study. (These excluded subjects, mostly 
Orientals, comprised about 10% of the total school 
population.) T 

The adult white population in this district is 
largely of middle or upper-middle socioeconomic 
Status; the three largest employers (mostly of 
whites) are the university, the Lawrence Radia- 
tion Laboratory, and a large pharmaceutical firm, 
all of which employ workers with better than 
average education and socioeconomic status for 
the white population as a whole. The adult black 
Population is predominantly lower-middle to low 
Socioeconomic Status, comprised largely of semi- 
skilled and unskilled workers, although it is & 
somewhat higher-socioeconomic-status group than 
the black populations in the surrounding com- 
munities, with fewer unemployed and on welfare. 

All tests were group administered to the regular 
classrooms by a staff of testers (3 whites and 3 
blacks) who were Specially employed and trained 
for this purpose. The white and black testers 
were assigned to classes at random. In any given 
class, the Level I and Level II tests were always 
administered by different testers on different 
days never more than one week apart. Thus the 
correlations between the Level I and Level Il 
tests would not be Systematically affected by any 
individual tester biases. 


Tests 


Control tests. Two different control tests were 
used, one in each of the two testing sessions. The 
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main purposes of the control tests were to set a 
good test-taking attitude in the class, emphasizing 
attention and effort while at the same time lessen- 
ing tension and text anxiety by giving subjects 
tasks they could perform successfully simply by 
being attentive and trying their best. 

The listening-attention test was given just 
before the Level I test (memory for numbers). 
The listening-attention test measures the child's 
ability to attend to and follow orally given direc- 
tions paced at two-second intervals by means ofa 
tape recording. The child is presented with an 
answer sheet containing 100 pairs of digits in sets 
of 10. The child listens to a tape recording that 
speaks one digit every two seconds. The child is 
required to put an X over the one digit in each 
pair that has been heard on the tape recorder. The 
purpose of this test is to determine the extent to 
which the child is able to pay attention to numbers 
spoken on a tape recorder, to keep his place in the 
test, and to make the appropriate responses to 
what he hears from moment to moment. Low 
scores on this test indicate that the subject is not 
up to validly taking the memory for numbers test, 
which follows immediately. High scores on the 
listen-attention test indicate that the subject has 
the prerequisite skills for taking the digit-span 
(memory for numbers) test. The listening-atten- 
tion test thus is intended as a means for detecting 
students who, for whatever reason, are unable to 
hear and to respond to numbers read over a tape 
recorder. The test itself makes no demands on 
the child’s memory, only on his ability for listen- 
ing, paying attention, and responding appro- 
priately—all prerequisites for the digit-memory 
test that follows. 

The speed and persistence test (making Xs) 
was always given just before the Level II tests 
(Lorge-Thorndike IQ). The making Xs test is 
intended as an assessment of test-taking motiva- 
tion, It gives an indication of the subject’s willing- 
ness to comply with instructions in a group-testing 
situation and to mobilize effort in following those 
instructions for a brief period of time. The test 
involves no intellectual component, although for 
young children it probably involves some percep- 
tual-motor skills component, as reflected in other 
studies by increasing mean scores as à function 
of age between Grades 1-5. Individual differences 
among children at any one grade level would seem 
to reflect mainly general motivation and test- 
taking attitudes in a group situation. Children 
who do very poorly on this test, it can be sus- 
pected, are likely not to put out their maximum 
effort on ability tests given in the same group 
situation, and to that extent, their ability test 
scores are not likely to reflect their real level of 
ability. 

The making Xs test consists of two parts. On 
Part 1 the subject is asked simply to make Xs in 
a series of squares for a period of 90 seconds. In 
this part the instructions say nothing about speed. 
They merely instruct the child to make Xs. The 
maximum possible score on Part 1 is 150, since 
there are 150 squares provided in which the child 
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can make Xs. After a two-minute rest period the 
child turns the page of the test booklet to Part 2. 
Here the child is instructed to show how much 
better he can perform than he did on Part 1 and 
to work as rapidly as possible. The child is again 
given 90 seconds to make as many Xs as he can in 
the 150 boxes provided. The gain in score from 
Part 1 to Part 2 reflects both a practice effect 
and an increase in motivation or effort as a result 
of the motivating instructions, that is, instruc- 
tions to work as rapidly as possible. 

Level I test. Previous studies have indicated 
that one of the most unambiguous and reliable 
Level I measures is digit-span memory. A specially 
devised test of such memory, which has much 
higher reliability than the short digit-span tests 
included in such general test batteries as the 
Stanford-Binet and the Wechsler, is the author’s 
memory for numbers test. It has three parts. Each 
part consists of six series of digits going from four 
digits in a series up to nine digits in a series. The 
digit series are presented on a tape recording on 
which the digits are spoken clearly by a male voice 
at the rate of precisely one digit per second. The 
subjects write down as many digits as they can 
recall at the conclusion of each series, which is 
signaled by a bong. Each part of the test is pre- 
ceded by a short practice test of three digit series 
in order to permit the tester to determine whether 
the child has understood the instructions, etc. 
The practice test also serves to familiarize subjects 
with the procedure of each of the subtests. The 
first subtest is labeled immediate recall. Here the 
subject is instructed to recall the series imme- 
diately after the last digit has been spoken on the 
tape recorder. The second subtest consists of 
delayed recall. Here the subject is instructed not 
to write down his response until 10 Seconds have 
elapsed after the last digit has been spoken. The 
10-second interval is marked by audible clicks 
of a metronome and is terminated by the sound 
of & bong, which signals the subject to write his 
response. The delayed recall condition invariably 
results in some retention decrement. The third 
subtest is the repeated series test, in which the 
digit series is repeated three times prior to recall; 
the subject then recalls the series immediately 
after the last digit in the series has been presented. 
Again, recall is signaled by a bong. Each repetition 
of the series is separated by a tone with a duration 
of one second. The repeated series almost invaria- 
bly results in greater recall than the single series. 
This test is very culture fair for children in second 
grade and beyond, who know their numerals and 
are capable of listening and paying attention, as 
indicated by the listening-attention test. The 
maximum score on any one of the subtests is 39, 
that is, the sum of the digit series from four 
through nine. Only the total score (i.e., the sum 
of the scores on the three subtests) is used in the 
present study. 

Level II tests. Level II was measured by the 
Lorge-Thorndike Intelligence Test (Level 3, 
Form B), which has two parts, verbal and non- 
verbal. This is a nationally standardized group- 
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administered test of general intelligence. In the 
normative population, which was intended to be 
representative of the nation's school population, 
the test has a mean IQ of 100 and a sigma of 16. 
The test is primarily a measure of reasoning 
ability; it has a high general intelligence satura- 
tion when factor analyzed with other mental 
ability tests, so it is deemed a good Level II test, 
especially the nonverbal part, which is based on 
Pictorial problems and depends not at all upon 
reading skill or scholastic knowledge. 


RzsurTS 


Control Tests 


On the listening-attention test there was 
no significant, difference between the white 
and black groups in any grade. The mean 
number correct (out of 100) was above 98 
for all groups in every grade, and the 25th, 
50th, and 75th percentile score was 100 in 
each group at each grade. Since a perfect 
score on this test is 100, it is evident that the 
vast majority of subjects were motivated to 
do their best in the test situation and were 
capable of correctly hearing the numerals as 
Spoken over the tape recording and of prop- 
erly following directions and registering their 
responses on answer sheets. Practically all 
Subjects obtained a perfect score. At this age 
level, there is no appreciable difference be- 
tween the grades or between whites and 
blacks on the listening-attention test. Since 
the correlation between the listening-atten- 
tion test and either the memory test or the 
Lorge-Thorndike Intelligence "Tests is not 
significantly greater than zero in the white or 
black group, it is clear that no significant 
amount of the variance in these tests is 
attributable to differences in the kinds of 
sustained attentiveness and willingness to 
comply with instructions that are assessed 
by the listening-attention test. 

On the speed and persistence test, (making 
Xs), the black group scored significantly 
higher than the white group on both the first 
and second try and on the gain score (i.e., the 
difference between second try — first try), 
and these differences are fairly consistent 
across the three grades. These results, 
like those for the listening-attention test, 
indicate that at least equally good coopera- 
tion and effort in the test situation were put 
forth by the black subjects as by the white 
subjects. The lower quartile scores should be 
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a most sensitive indicator of children who are 
not cooperating or putting out much effort, 
and at every grade the performance of black 
subjects equals or exceeds that of the white 
subjects. These results contradict the com- 
mon notion that black children have a slower 
“personal tempo” or are less cooperative or 
more lackadaisical in a test situation. The 
correlation between making Xs and the 
memory for numbers and Lorge-Thorndike 
Intelligence Tests are close to zero in both 
racial groups. 


Mean White-Black Difference in Memory 
(Level I) and Intelligence (Level II) Tests 


The hypothesis in its most simple and ex- 
treme form states that low- and middle- 
Socioeconomic-status groups differ in Level 
II but not in Level I ability. Table 1 shows 
the raw score means on the Level II and 
Level I tests in the white and black groups 
and (in the last column) shows the group dif- 
ference in terms of the total within-groups 
variation. We see that although the white- 
black difference is highly significant both on 
the memory and on the intelligence tests, the 
difference on the intelligence tests is more 
than twice the difference on the memory 
test. It is thus unclear whether this finding 
disproves or supports the hypothesis. It 
would seem to disprove the ‘no difference on 
Level I” aspect of the hypothesis, and yet 


TABLE 1 
Raw Score Means AND STANDARD DEVIATIONS 
ON INTELLIGENCE (LEVEL II) AND MEMORY 
Tests (LEVEL I) anb Mean WnurTE-BLACK 
DIFFERENCES IN SIGMA UNITS 


co GU 
Variable (n = 1,489) (n = 1,123) Ae 
M SD M SD 
Age (in 131.23) 10.89/132.61| 11.24] —.12 
months) 
Intelligence 
Verbal 69.85| 12.56] 46.24| 16.88] 1.62 
Nonverbal | 63.12| 10.83| 43.47| 14.50| 1.57 
Memory 
Immediate |23.33| 6.41 18.75| 6.61) .70 
Repeat 26.89] 5.81) 23.40] 6.56]  .57 
Delay 24.25) 5.76] 20.29] 6.73 .04 
Total 74.48) 15.58) 62.45] 16.82| .75 


_Note. Sigma is the Square root of the combined 
within-groups variance. 
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the results are consistent with the hypothesis 
in that the white-black difference on the 
memory test is very much less than the dif- 
ference on the intelligence test. Since the 
theory also posits a correlation between 
Level I and Level II and a higher correlation 
in higher- than in lower-socioeconomic-sta- 
tus groups, we should expect a Level I differ- 
ence between the present white and black 
groups if the IQ of the white group is above 
the white mean for the general population or 
if the present black group is below the gen- 
eral black mean IQ. In the general popula- 
tion, the groups differ by only about one sigma 
or 16 IQ points, while in our Berkeley pop- 
ulation the difference is considerably greater. 
In terms of the Lorge-Thorndike national 
norms, the results of the present white group 
are verbal IQ, M = 1184, SD = 15.7; non- 
verbal IQ, M — 120.24, SD — 14.6. The 
results of the present black group are verbal 
IQ, M = 92.8, SD = 13.9; nonverbal IQ, M 
= 95.4, SD = 15.5. The consequences of this 
difference between the groups used in the 
present study and the averages for the 
general United States population can be 
more easily discussed in the next section in 
connection with the regression of memory 
scores upon intelligence scores. 


Regression of Memory upon Intelligence 


The hypothesis predicts a steeper slope of 
the regression line of Level I (memory) 
scores upon Level II (intelligence) scores in 
the white group than in the black group. 
Figure 2 shows the relevant regression lines 
for the Lorge-Thorndike nonverbal scores. 
The graphical results are practically identical 
for the verbal scores. 

A statistical test of departure from line- 
arity was applied to all the regressions and 
none was found to depart significantly (at 
the .10 level) from linear regression. Though 
the linearity of the regression appeared to 
extend throughout the entire range of scores 
for both racial groups—8 total range of more 
than 100 IQ points beginning at about an 
IQ of 50—the regression lines shown in 
Figures 2 and 3 were drawn to extend only 
over the range of scores that permits an 
unequivocal test of departure from line- 
arity. (The ns at the very extremes of the 
distributions [less than the upper and lower 
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Fıcure 2. Regression of memory scores upon 
Lorge-Thorndike nonverbal intelligence raw 
scores in white and black groups. 


2.5%] are too small and scattered to permit 
statistical confidence of linearity in the 
regions beyond approximately the middle 
95% of the distributions.) 

We see in Figure 2 (and in verbal scores as 
well) that the regression is greater for whites 
than for blacks. For the Lorge-Thorndike 
verbal scores, the regression, b, is .58 for 
whites and .42 for blacks. A test? of the 
significance of the difference between the 
two slopes is highly significant (t = 4.10, p 
< .001). For the Lorge-Thorndike nonverbal 
scores, the white-black difference in slope 
is also highly significant (t = 4.35, p < 001). 

According to the norms provided by the 
Lorge-Thorndike test manual, the point on 
the scale of raw scores at which the regres- 
sion lines for the present white and black 
groups cross over is equivalent to a Lorge- 
Thorndike IQ of approximately 100, both 
for the verbal and the nonverbal tests. In the 
range of intelligence below an IQ of 100, the 
black children, on the average, surpass white 
children in memory scores; in the range 
above an IQ of 100, the white children sur- 
pass the black children in memory per- 
formance. The crossover in the above-aver- 
age IQ range is clearly not a statistical 
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artifact as was originally believed when only 
small sample data based on selected groups 
were available. These results mean that, on 
the average, the white child below an IQ of 
approximately 100 has a poorer memory 
span than his black counterpart in IQ, and 
the white-black difference increases, in 
favor of the black child, the lower the IQ. In 
terms of national IQ norms, the approxi- 
mately 80% to 85% of black children who 
fall below an IQ of 100 would, on the aver- 
age, surpass in memory span the 50% of 
white children who fall below an IQ of 100. 
If we assume that the white and black re- 
gressions in the general United States popu- 
lation are the same as those in the present 
data and if the general white and black IQ 
means are 100 and 85, respectively, then, 
according to the regression equations in the 
present data, we should expect the white and 
black populations (which differ 1 sigma in 
IQ) to differ by only about .3 sigma to .4 
sigma in memory Span, in favor of whites. 
That is to say, the present data do not sup- 
port the hypothesis of no white-black dif- 
ference in Level I (here measured by the 
memory test) but the data do indicate a 
much smaller racial difference in memory 
than in IQ. This conclusion would, of course, 
not hold if the relative slopes of the regres- 
sion lines for the two races are not about the 
Same in the general population as in the 
Berkeley school population. The rather 
atypical nature of the Berkeley population 
with respect to mean Lorge-Thorndike IQ, 
especially in the white population, should 
make us wary of generalizing to the general 
population or to the populations of other 
communities with markedly different demo- 
graphie and Socioeconomic-status features 
than Berkeley. 


Regression of Intelligence upon Memory 


"These regression lines present a very dif- 
ferent pieture from that of the regression of 
memory upon intelligence. As seen in Figure 
3, the slopes of the regression lines for whites 
and blacks are parallel (the regression coeffi- 
cients do not differ significantly), and they 
are separated by approximately 1.6 sigmas on 
the intelligence scales. (The results are 
virtually identical for the verbal scores.) 
Thus there is no point on the scale of memory 
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Scores at which equated groups of whites and 
blacks obtain equal intelligence scores. The 
picture is close to the hypothetical regression 
lines depicted in Figure 1. It would seem to 
be consistent with the hypothesis that 
Level I is necessary but not sufficient for the 
development and functioning of Level II. 
Why should white and black children with 
precisely the same memory performance dif- 
fer by 1.6 sigmas on both the verbal and non- 
verbal intelligence measures? When matched 
for intelligence, on the other hand, whites 
and blacks are considerably more alike in 
memory, and they average just about the 
same in memory performance when matched 
on intelligence in the vicinity of an IQ of 100. 
In other words, it appears that if subjects 
have the intelligence, they have the memory; 
while if they have the memory, they do not 
necessarily have the intelligence. 


Dispersion of Memory Ability as a Function 
of Intelligence 


If it is true that intelligence depends upon 
memory but that the reverse does not hold, 
we should expect the dispersion of memory 
Scores to show a systematic decrease going 
from low to high levels of intelligence. To 
test this hypothesis, the standard error of 
estimate of memory scores (i.e., the standard 
deviation of memory scores around the 
regression line of memory upon intelligence) 
was examined for systematic change over the 
full range of Lorge-Thorndike Intelligence 
Test scores, verbal and nonverbal. The 
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Figure 4. Memory score dispersion (standard 
error of estimate) as a function of Lorge-Thorn- 
dike verbal and nonverbal raw scores in white and 
black groups. 


results are shown in Figure 4. Since the 
standard errors of estimates (indicated by 
circles) are rather erratic, their trend is 
better indicated by a moving average (the 
line going through the data points). For the 
nonverbal test the trend is clearly in accord 
with the hypothesis; that is, the standard 
error of estimate of the memory scores 
systematically decreases with increasing 
nonverbal intelligence. Bartlett's test for 
homogeneity of variances and a test of trend 
are both significant (p < .01) both for whites 
and for blacks. The results for the verbal 
test, however, yield only a faint suggestion of 
a decreasing standard error of estimate, and 
the trend is nonsignificant. 

Thus the prediction based on the hy- 
pothesized hierarchical relationship between 
Level I and Level II is borne out by the non- 
verbal but not by the verbal test. Why 
should the two tests differ in this way? One 
ean only speculate at this point. A possibility 
is that while both tests are highly saturated 
with general intelligence, the nonverbal test 
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is more a measure of what Cattell (1971, 
chap. 5) calls “fluid” intelligence and the 
verbal test is more a measure of “crys- 
talized” intelligence. The hypothesized 
hierarchical relationship between Level I 
and Level II may hold only for Level II as 
measured by tests of fluid intelligence. But 
this conjecture cannot be tested with the 
present data and must await a study spe- 
cially designed for this purpose. 


Socioeconomic-Status Differences within 
Racial Groups 


A questionnaire sent to the home of every 
child in the study, as well as the school 
records, served as the basis for classifying 
subjects according to socioeconomic status. 
Among other items of family background 
information obtained from the parents was 
the current occupation (or last job held in 
case of the unemployed) of the head of the 
household. Since returns of the parental 
questionnaire were considerably less than 
100%, especially in the black group, and 
not all the questionnaires that were returned 
had answered the occupation question, 
the sample size for the socioeconomic-status 
analysis was reduced and the remaining 
subjects cannot be regarded strictly as a 
random sample of the Berkeley school popu- 
lation because of the self-selection in an- 
swering the questionnaire. When parent’s 
occupation was not given in the question- 
naire, it was sought in school records, but 
often without success. If the parental oc- 
cupation appeared in the school records, it 
almost invariably was given in the question- 
naire and vice versa. Lack of information or 
ambiguity or doubt in the socioeconomic- 
status classification of a given occupation 
was cause for omitting subjects from the 
present analysis. 

Parental occupations were first coded into 
82 job description categories. These were 
then reduced to seven categories in terms of 
conventional socioeconomic-status rankings 
of the occupations. But in order to obtain 
large enough socioeconomic-status samples 
to allow tests of Level I-Level II correla- 
tions and regressions within each sample, 
these seven categories had to be reduced to 
three broad socioeconomic-status categories 
as follows: 


108 


High Socioeconomie Status 
1. High-level administrators, supervisors, 
college teachers. 
2. High-level professionals, 
physicians, etc. 
Middle Socioeconomic Status 
3. White collar occupations requiring 
college or technical training. 
4. Self-employed, technicians, 
craftsmen. 
5. Merchants, managers of small business, 
Service workers, contractors. 
Low Socioeconomic Status 
6. Manual workers. 
7. Nonmanual workers, relatively un- 
skilled, jobs ordinarily requiring less 
than a high school diploma. 


engineers, 


skilled 


The categories are admittedly crude and 
somewhat arbitrary, but would undoubtedly 
correlate highly with any of the various 
methods of socioeconomic-status classifica- 
tion. 

Table 2 gives the means and standard 
deviations of the three socioeconomic-status 
groups within each race. The row labeled 
“total” is based on the subjects who were 
classifiable. The “population” row consists 
of all subjects on whom test data were 
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available, whether they were classifiable by 
socioeconomic status or not. It can be seen 
that the total and population values do not 
differ appreciably in means or standard 
deviations, which indicates that the subjects 
who were classified by socioeconomic status 
are a fairly representative sample of the 
school population, at least as regards the 
present test variables. 

Though there are the expected differences 
between each of the socioeconomic-status 
levels, among whites the largest differences 
are seen between the middle and low groups, 
while among blacks the largest differences 
are between the high- and middle-socioeco- 
nomic-status levels. But. this difference be- 
tween the racial groups is of little signifi- 
cance, since whites and blacks are not per- 
fectly matched for occupations within the 
three broad socioeconomic-status categories. 
The average racial difference (see the last 
column of Table 2) within each socioeco- 
nomic-status level is slightly larger than the 
high-low | socioeconomic-status differences 
within each race for the Lorge-Thorndike 
verbal and nonverbal scores. For the memory 
Scores, on the other hand, the high-low 
socioeconomic-status difference within each 
racial group is greater than the difference 


TABLE 2 


MEANS AND STANDARD DEVIATIONS or INTELLIGENCE AND Memory Raw Scores or SOCIOECONOMIC- 
Status LEVELS WITHIN RACIAL GnouPs AND MEAN DIFFERENCE IN SIGMA UNITS 


5 » White Black 
^ a 
Test SSH a ene T ux SEN | jim 
"n M SD n M SD 
ERI GRÉ NY Sieg S ie [iss 
Lorge-Thorndike verbal High 763 | 71.6 | 10.40 38 | 58.4 | 13.22 | 1.25 
Middle 287 | 70.9 | 9.98 43 | 50.5 | 15.88 | 1.87 
Low 215 | 60.3 | 16.42 | 414 | 45.7 | 15.99 .90 
MUR 3 as 69.6 | 12.24 | 495 | 47.2 | 16.19 | 1.66 
l opulation ^ 69.9 | 12.56 . .63 
Lorge-Thorndike nonverbal High 763 | 65.3 225 oS 23 TEA 146 
Middle 287 | 63.6 | 9.61 43 | 46.6 | 12.04 | 1.71 
Low 215 | 55.6 | 14.82 | 414 | 43.0 | 13.63 -90 
paal ü bas 63.4 10.72 | 495 | 44.1 | 13.59 | 1.66 
pulation $ 63.1 | 10. .56 
Memory (total score) High 763 | 74.7 | 15/28 ne FI 2| X 
Middle 287. | 73.7 | 14.01 43 | 63.4 | 16.65 2 
Tay 215 | 65.5 | 15.28 | 414 | 60.9 | 16.76 .28 
otal 1,265 | 72.9 | 15.34 | 495 | 61.7 | 16.86 1 
Population | 1,489 | 74:5 | 15.58 1,123 | 62.5 | 16.82 4 


Note. My — M3/c, where c is the Square root of 
* Total of all subjects who were classified by soe 
^ The entire school population in Grades 4-6, 


the combined within-groups variance. 
1oeconomie status. 
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between the racial groups. Expressed in 
units of the white population sigma, the 
high-low — socioeconomic-status difference 
for whites on the Lorge-Thorndike verbal — 
.90, on nonverbal — .90, on memory — .59; 
the corresponding figures for blacks are 
1.02, .86, and .52. In both racial groups, the 
high-low socioeconomic-status difference is 
almost twice as great for the intelligence 
tests as for the memory test, which accords 
with the hypothesis at least in a directional 
sense, that is, Level II ability is more highly 
correlated with socioeconomic status than is 
Level I ability. 

Correlations and regressions within socio- 
economic-status groups. Table 3 shows the 
correlation and regression of memory upon 
intelligence in each of the socioeconomic- 
status groups by race. The theory predicts 
higher correlations and regression coefficients 
in upper- than in lower-socioeconomic-status 
groups. This is not completely borne out by 
the data. The white-socioeconomic-status 
groups show no systematic trends in this 
respect, but the black-socioeconomic-status 
groups show the predicted trend, that is, 
lower correlations and regressions with lower 
socioeconomic status. The black-high- and 
middle-socioeconomic-status groups both 
appear quite different from the black-low- 
socioeconomic-status group. The differences 


TABLE 3 


ConnzLATIONS (r) AND REGRESSION COEFFICIENTS 
(b) or Memory SCORES UPON LonGE-THORNDIKE 
VERBAL AND NONVERBAL RAW Scores IN 
Socronconomtc-StaTus LEVELS WITHIN 

RactaL GROUPS 


White Black 

Socioeconomic | Verbal | Nonverbal] Verbal | Nonverbal 
ERA pne fo 

High .376|.551|.361|.628} .552) .673|.594|.843 
Middle .414|.581|.324|.472. .408|.428|.496| .686 
Low -591| .550) 605] 624} .391 .410| .311] .382 
Total® .464|.582| .442| .632| .419] .436| .365) 453 
Population 466} .578| .443] .637| .420 .419} .372| .436 


^ Total of all subjects who were classified by 
socioeconomic status (white, n = 1,265; black, 
n = 495). 

b The entire school population in Grades 4-6 
(white, n = 1,489; black, n = 1,123). 
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of regression coefficients between all 15 pos- 
sible contrasts of Socioeconomic Status X 
Race groups in Table 3 were subjected to t 
tests (two-tailed) to determine their signifi- 
cance. Only three of the contrasts are signifi- 
cant beyond the .05 level (two-tailed), all 
involving only the nonverbal test: high- 
socioeconomic-status white — low-socioeco- 
nomic-status black (p < .01); low-socio- 
economic-status white — low-socioeconomic- 
status black (p < .01); high-socioeconomic- 
status black - low-socioeconomic-status 
black (p < .03). 

All the significant differences involve ex- 
clusively the low-socioeconomic-status-black 
group, and the only significant within-race 
socioeconomic-status difference is between 
high- and low-socioeconomic-status blacks. 
The difference in regressions, therefore, ap- 
pears to involve race more than socioeco- 
nomic status, or a combination of race 
and socioeconomic-status effects, since the 
low-socioeconomic-status-black group is un- 
doubtedly somewhat below the low-so- 
cioeconomic-status-white group in socio- 
economic status. The regressions of the 
high-and middle-socioeconomic-status-black 
groups do not differ significantly from those 
of the white groups. 


Discussion 


The present study examined three main 
aspects of the Level I-Level II theory of 
mental ability, namely (a) the relative 
magnitudes of socioeconomic status and 
white-black differences in Level I and Level 
II abilities, (b) socioeconomic status and 
racial differences in the correlation between 
Levels I and II and in the regression of 
Level I upon Level II, and (c) the hierar- 
chical (.e., necessary-but-not-sufficient) 
functional dependence of Level II perfor- 
mance on Level I ability. 

The a theory as originally stated in its 
simplest form predicts a socioeconomic- 
status difference in Level II ability but not 
in Level I ability. This formulation, how- 
ever, was intended more as an unambiguous 
basis for a directional prediction than as a 
precise expectation of reality, for in reality 
it is, of course, most improbable that there is 
“no difference” between any two populations 
in any given trait. So the realistic issue is the 
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relative magnitudes of differences between 
populations in Levels I and II. In accordance 
with previous findings, it was found that the 
white and black groups, and to a slightly 
lesser degree the high- and low-socioeco- 
nomic-status groups within each race, dif- 
fered much more, on the average, in Level II 
than in Level I ability. The exact size of the 
differences, of course, depends upon the 
particular populations being compared and 
is not regarded as an intrinsic aspect of the 
theory, the main point of which is that 
populations can differ in these two classes of 
ability and that the direction of the dif- 
ference in socioeconomically stratified popu- 
lations is such that the higher and lower 
groups will show a greater difference on 
Level II than on Level I. The reason for this, 
according to the theory, is that social mo- 
bility in an industrialized society is more 
dependent upon Level II than upon Level I 
abilities. In the present study the white- 
black differences are larger than the socio- 
economie-status differences within the racial 
groups, but the point is ambiguous here since 
the average socioeconomic-status difference 
between the races is probably greater than 
the high-low socioeconomic-status dif- 
ferences within each racial group. The strict 
criteria for socioeconomic-status classifica- 
tion used here resulted in the inclusion of a 
peculiarly small percentage of the black 
population in the high- and middle-socio- 
economic-status categories. It would be 
advisable in future studies to have socio- 
economic-status ratings on a continuous 
scale based on a large number of home back- 
ground factors, which might reflect more 
closely the nature of the child’s environment 
than does merely the occupational classifi- 
cation of his parents. 

The hypothesized higher correlation of b 
between Levels I and II in the white than in 
the black group was fully borne out by the 
data, as was also the predicted higher regres- 
sion of Level II upon Level I. The effect is 
largely attributable to the difference be- 
tween the entire white sample and the 
low-socioeconomic-status-black group, which 
constituted the vast majority of the present 
black sample. The high-socioeconomic- 
Status- and middle-socioeconomic-status- 
black groups do not differ significantly from 
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the white population in this respect but differ 
significantly from the low-socioeconomic- 
status-black group. 

The cause of different Level I-Level II 
correlations (or regressions) in different 
populations has not yet been established — 
and at present can only be hypothesized, 
There are several possible causes of correla-_ 
tion and they are not mutually exclusive: (a) | 
part-whole functional dependence, that is, 
one behavior may be a subunit of some other 
behavior, such as shifting gears smoothly and ] 
passing a driver’s test consisting of drivingin 


traffic with an examiner present; (b) hierar- 1 
chical functional dependence, that is, one - 
behavior is prerequisite to another or one is | 
functionally dependent upon another, 


skill in working problems in long division is 
dependent upon skill in multiplication; (c) 
environmental correlation between the be- 
haviors, that is, cultural contingencies may 
be such that when one behavior is learned 
another is also likely to be learned, even 
though there is no functional connection 
between the two behaviors, for example, & 
knowledge of baseball and a knowledge of 
football; and (d) genetic correlation between 
behaviors due to common assortment. of 
their genetic underpinnings through selection 
and homogamy and pleiotropism (one gene 
having two or more phenotypic. effects). 
The rather low degree of correlation between 
our Level I and Level II tests suggests that 
there is little functional dependence, an 
this could be proved conclusively if one 
could find a group of subjects that reliably 
showed a zero correlation between Level I 
and Level II. The fact of quite large an 
significant differences in Level I-Level II 
correlations in various populations is also 
inconsistent with wholly functional or part- 
whole dependence as a cause of the correla- 
tion. Some substantial part of the correlation 
therefore, must be attributable to other 
causes. If the cause is common environmenta 
influences on the Level I and Level II tests, 
it is hard to imagine what these influences 
might be and why, if they are common, there 
should be such large group mean differences 
in Level II ability and not in Level I. The 
most reasonable hypothesis at this point 
would seem to be that the correlation is due 
only slightly to functional dependence of 


ABILITIES, RACE, AND SOCIOECONOMIC STATUS 


Level II upon Level I and mostly to a com- 
mon genetie assortment on both faetors, 
that is, a genetic correlation in the popula- 
tion between two broad classes of ability 
with different genetic underpinnings. If this 
were the case, we might find a wide range of 
correlations in different populations; one 
conceivably might even find a group in 
which the correlation is negative. This would 
tend to rule out pleiotropy and would sug- 
gest independent mechanisms under in- 
dependent genetic control underlying Level 
I and Level II. Specially designed studies 
would be required to test such a hypothesis. 

The test of the c hypothesis of hierarchical 
dependence of Level II upon Level I yielded 
significant evidence consistent with the 
hypothesis in the case of the nonverbal 
intelligence test but not the verbal. In any 
case, there does not appear to be evidence of 
any strong degree of functional dependence 
between the abilities; quite low or high scores 
on the one ability are not incompatible with 
a high or low score on the other, though 
there is a tendency for low intelligence — high 
memory to be more frequent than the op- 
posite combination of abilities, especially for 
nonverbal intelligence. 

In the present study, Level I ability was 
measured by three slightly differing forms of 
a single type of test—digit-span memory. 
In other studies different tests have been 
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used—paired-associate learning, serial learn- 
ing, and free recall of pictures and ob- 
jects—all with similar results generally con- 
sistent with the formulation of the two- 
level theory. 
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STIMULUS AND RESPONSE VARIABLES IN CHILDREN’S 


LEARNING OF GRAPHEME-PHONEME 
CORRESPONDENCES 


GEORGE MARSH,! anp PETER DESBERG 
California State College, Dominguez Hills 
LAWRENCE KENT FARWELL 
University of California, Santa Barbara 
The effects of stimulus and response abstractness and similarity were 


investigated in children’s paired-associate learning of grapheme- 
phoneme correspondences. In the first experiment, stimulus and re- 


Sponse abstractness were investigated in a factorial design. The 
ource of learning difficulty was re- 
study, stimulus and response simi- 
y. Neither factor had a significant 
responses. The implications of the 
honeme correspondences were dis- 


results indicated that the major sı 
sponse abstractness. In a second 
larity were also varied factoriall 


effect on the number of correct 


results for teaching grapheme-p! 


cussed. 


The purpose of the present study was to 
investigate the effects of stimulus and re- 
sponse abstractness and similarity on chil- 
dren’s learning of grapheme-phoneme cor- 
respondences. Both of these factors are 
known to be an important factor in adult 
paired-associate learning, but little research 
on these variables has been conducted with 
children. Keppel (1964), in his review of 
children’s verbal learning, commented upon 
the absence of studies investigating these 
factors with children. 

Since Keppel's (1964) article, studies by 
Paivio and Yuille (1966) and Dilley and 
Paivio (1968) have investigated the effects 
of meaningfulness and abstractness on chil- 
dren’s paired-associate learning. However, 
no study has investigated the effects of 
these variables in the context of tasks 
directly related to beginning reading. 

Although young children perform rel- 
atively well on paired-associate learning of 
concrete materials such as picture-picture 
pairs (e.g., Jensen and Rowher, 1965), they 
have considerable difficulty in learning 


! Requests for reprints should be sent to George 
Marsh, California State College, 1000 East Vic- 
toria Street, Dominguez Hills, California 90246. 
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grapheme-phoneme correspondences that 
from their point of view are relatively 
abstract and meaningless (Calfee, Chapman, 
& Venezky, 1972; Marsh & Sherman, 1969). 

Feldman, Johnson, and Mast (1972) have 
recently reported that performance on 
concrete paired-associate tasks is unrelated 
to school achievement, while performante 
on abstract tasks shows a substantial cor 
relation with school achievement. This 
would be expected since most paired- 
associate tasks in school are relatively 
abstract, 


EXPERIMENT 1 


The purpose of the first study was to 
determine whether children's relatively poor 
performance on grapheme-phoneme a 
Tespondence tasks is primarily a function 0 
stimulus or response concreteness. A fat- 
torial design varying stimulus and rep 
concreteness which was similar to that use 
by Cieutat, Stockwell, and Noble (100 
with adults and Dilley and Paivio (1968 
with children was employed. This involves 
four combinations of stimulus and r epo 
concreteness: high-high, high-low, low Hiei 
and low-low. In the high-high group, k 
stimulus and response items were bot 


familiar pictures (Group Picture—Picture); 
in the high-low group, the stimulus was a 
picture and the response was a letter sound 


group, the stimulus was a letter and the 
response item was a picture (Group Letter- 
Picture); and in the low-low group, the 
| stimulus was a letter and the response its 
appropriate letter sound (Group Letter- 
Letter). 


Method. 


Subjects. The subjeets were 40 nursery school 
children between the ages of 53 and 65 months 
with a mean age of 61 months. Each child was pre- 
tested for knowledge of grapheme-phoneme pairs. 
Three subjects were excluded from the experiment 
on this basis. 

Stimulus materials. The stimulus materials for 
the abstract conditions were 12 lowercase graph- 
emes approximately 2 inches high printed on 4 X 6 
inch cards. The graphemes consisted of 6 stop 
! consonants (p, t, K, b, d, and g) and 6 continuant 
consonants (f, v, s, n, z, and m). The responses in 
this condition eonsisted of the appropriate pho- 
neme associated with each grapheme. The stop 
^ phonemes were followed by a reduced schwa (e.g. 
C /pà/) since they are impossible to produce in com- 
plete isolation. The stimulus materials for the 
conerete conditions consisted of 12 pictures of 
common objects taken from a children's alphabet 
book (telephone, bird, lion, apple, keys, flowers, 
‘shoes, doll, clock, house, rabbit, and balloons). 

Procedure. The subjects were randomly assigned 
to one of the conditions (pieture-pieture, picture- 
letter, letter-picture, or letter-letter) with restric- 
f tion of equal ns in each group (n = 10). When the 
subject was seated across from the experimenter, 
he was pretested for knowledge of grapheme- 
phoneme pairs and then given the following in- 
“structions. 


We're going to play a game. First I'll tell you 
what goes with each of tht i 


ae 


» pictures (letters). 
Then when I show you the picture (letter) again, 
you tell me what goes with it. After you answer 
I'll tell you what goes with the picture (letter). 


l The experimenter then selected a list of 6 items 
previously made up from the pool of materials 
available. The type of list constructed depended 
on the group in which the subject was assigned. In 
the picture-picture condition, the 12 pictures were 
randomly paired to form the 6-item list; in the 
picture-letter condition, 6 pictures were randomly 
paired with 6 of the 12 phonemes; in the letter- 
ieture condition, 6 graphemes were randomly 
paired with 6 pictures; and in the letter-letter 
condition, 6 of the 12 grapheme-phoneme pairs 
were selected randomly. x 
The subjects were then given a familiarization 
trial (during which the experimenter pronounced 
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(Group Picture—Letter); in the low-high: 
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the phonemes and the picture name responses as- 
sociated with each stimulus). Following the famil- 
jarization trial, each subject was given 10 training 
trials by a method of anticipation in which the 
child was given approximately 10 seconds to 
respond. After a response or the anticipation 
period had elapsed, the experimenter presented the 
correct stimulus-response pair. The responses were 
recorded as either correct, omission, or intrusion. 


Results 


The mean number of correct responses, 
omission errors, and intralist intrusion 
errors are shown in Table 1. 

The correct responses were analyzed in a 
2 X 2 X 10 factorial analysis of variance. 
The between-subjects factors were stimulus 
and response abstractness and the within- 
subjects factor was trials. The effect of 
stimulus abstractness was not significant 
(F <1). The effect of response abstractness 
was significant (F = 41.49, df = 1/36, p < 
.001). 

The trials effect was significant (F = 
32.50, df = 9/324, p < .001). The inter- 
action between trials and stimulus ab- 
stractness was not significant (F = 1.19, 
df = 9/324) but the interaction between 
type of response and trials was significant 
(F = 3.98, df = 9/324, p « .01). 

The omission errors were also analyzed 
ina 2 X 2 factorial analysis of variance. 
Again, the effect of the stimulus factor was 
not significant (F < 1). The effect of re- 


TABLE 1 
Mean NuwnER or Correct RESPONSES, 
OMISSION ERRORS, AND INTRALIST 
INTRUSION ERRORS 


Correct| Omis- | Intru- 
Group re- sion sion 
sponses| errors errors 
Experiment 1 
Letters-letter sounds 20.6 | 27.7 | 11.7 
Pictures-letter sounds 16.2 | 31.9 | 11.9 
Letters-picture names 41.0 | 8.3 | 10.7 
Pictures-picture names 46.9| 7.9 | 5.2 
Experiment 2 
Low visual-low acoustic 17.5 | 13.7 | 8.8 
High visual-low acoustic 15.2 | 15.5 | 9.3 
Low visual-high acoustie | 16.0 | 12.8 | 11.2 
High visual-high acoustic | 12.7 | 12.9 14.4 
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sponse factor was significant (F = 42.8, df 
1/36, p < .001). The interaction be- 
tween stimulus and response factors was not 
significant (F < 1). 

A similar 2 X 2 factorial analysis of 
variance was carried out on the intrusion 
errors. Neither the main effects nor the in- 
teraction was significant. 


EXPERIMENT 2 


The purpose of the second study was to 
investigate the effect of intralist stimulus 
and response similarity on children’s learn- 
ing of grapheme-phoneme correspondences. 
The design of the study was similar to that 
of the first experiment except that stimulus 
and response similarity, rather than ab- 
stractness were combined factorially to 
form four conditions: high visual and high 
acoustic similarity (Group High Visual- 
High Acoustic); high visual and low acoustic 
similarity (Group High Visual-Low Acous- 
tic); low visual and high acoustic similarity 
(Group Low Visual-High Acoustic); and 
low visual and low acoustic similarity 
(Group Low Visual-Low Acoustic). 


Method 


Subjects. The subjects were 40 nursery school 
children between the ages of 51 and 64 months 
with a mean age of 58 months. As in the first ex- 
periment, the subjects were screened for their 
knowledge of the grapheme-phoneme correspon- 
dences used in the study. Two children were ex- 
cluded from the study on this basis. 

Stimulus materials. The paired-associate lists 
were four-item grapheme-phoneme pairs that were 
selected on the basis of graphemic similarity using 
the data of Dunn-Rankin (1968) to scale visual 
similarity and the data on isolated phoneme dis- 
criminability collected by Marsh and Sherman 
(1971) to scale acoustic similarity. The lists were 
constructed so that each grapheme would be 
visually similar to one other grapheme in the high- 
visual-similarity list and each phoneme would be 
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acoustically similar to one other phoneme in th 
high-acoustic-similarity list (See Table 2). 

The stimuli were lowercase letters approxi 
mately 2 inches high printed on 4 X 6 inch cards 
The responses consisted of the pronunciation ol 
the letter sounds (phonemes), The phonemic Té 
sponse to the grapheme ¢ in Group High Visual 
Low Acoustic and to the h in Group Low Visual- 
High Acoustic was the fricative /0/ which is 
spelled with a th and is acoustically confusabl 
with /f/. 

Procedure. The procedures for assigning sub: 
jects to groups, pretesting for knowledge of letter 
sound associations, and training by the method of 
anticipation was identical to procedures followed 
in the first experiment. 


Results 


The mean number of correct responses 
omission errors, and intralist intrusion 
errors for the four groups is shown in Table 
l A2 X 2 X 10 analysis of variant 
showed that neither stimulus similarity 
(F = 1.76, df = 1/36, p < .01) nor respon 
similarity (F = 1.04, df = 1/36, p < Ml 
was significant. The trials effect was signif 
cant (F = 19.69, df = 9/324, p < .001), but 
there was no significant interaction be 
tween trials and the stimulus and response 
variables. s 

A 2 X 2 factorial analysis of variance o 
the omission errors showed no significan 
main effects or interactions. 1 

A 2 X 2 factorial analysis of variance 
the intrusion errors showed that the mal 
effect for stimulus similarity was not 8 
nificant (F = 1.50, df = 1/36), while tl 
main effect for response similarity was 
nificant (P = 6.36, df = 1/36, p < 0d) 


L 


TABLE 2 
Parrep-Associare Lists AND RESPONSES FoR THE Visvat-Acoustic SIMILARITY GROUPS 
Similarity group List Response 
High visual-high acoustic 
High visual-low acoustic aba up be du ph PA w/ 
Low visual-high acoustic s,f E i vb | fof 
Low visual-low acoustic b, s, m, k COUR M /k/ 
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poor performance on grapheme-phoneme 
correspondence pairs as compared with 
concrete tasks such as pieture-picture pairs 
is response availability. It made no sig- 
nificant difference if the stimuli were graph- 
emes or familiar pictures, but it made a 
large difference in performance if the re- 
sponses were phonemes as opposed to picture 
names. This conclusion is supported by both 
the analysis of correct responses and the 
analysis of the omission errors. These results 
are in agreement with the results of most 
adult studies of paired-associate learning. 
As Underwood and Schultz (1960) point 
out, the subject does not have to produce 
the stimulus but does have to produce the 
response; thus implicating response avail- 
ability as the major factor in paired- 
associate task difficulty. However, the 
present results are in conflict with a pre- 
vious study by Dilley and Paivio (1968) 


nificantly improve performance as com- 
‘pared to spoken words as stimuli, while 
pictures as response items retarded learning 
as compared to spoken words as responses. 
The difference in the outcome of this study 
and that of Dilley and Paivio (1968) is most 
probably a function of the materials used. 
While pictures as stimuli may mediate 
association to spoken words, it is hard to 
see how they could have this effect when 
responses are phonemes. Spoken words as 
responses are apparently more effective 
than pictures as response items because the 
‘subject was required to produce a picture 
name (i.e., a spoken word) rather than an 
+ image of the picture in the Dilley and Paivio 
d experiment. 

Many research programs on beginning 
reading concentrate on the visual aspects 
p the reading task. This emphasis appears 
to be misdirected, at least when dealing with 
simple grapheme-phoneme correspondences. 
' The major practical implication of the 
present research is that efforts to facilitate 
the acquisition of grapheme-phoneme cor- 
respondences by prereading children should 

e directed toward increasing response 
availability rather than stimulus differentia- 
tion or familiarity. Prereading children 
apparently have little difficulty in dealing 
with letters as stimuli but a great deal of 


which found that pictures as stimuli sig-" 
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difficulty in dealing with letter sounds as 
responses. 

In Experiment 2, although both the 
graphemes and phonemes were selected to 
maximize the effect of similarity, there was 
no overall effect of these variables on the 
number of correct responses. Underwood, 
Runquist, and Schultz (1959) have dis- 
cussed the dual and opposed effects of re- 
sponse similarity on learning. On the one 
hand, increases in response similarity may 
facilitate learning by increasing response 
availability, while at the same time in- 
terfering with the associative component due 
to intrusion errors. In the present experi- 
ment, there was no significant effect OF 
similarity on response availability as in- 
dexed by omission errors; however, there 
was a small but statistically significant 
negative effect of response similarity on the 
associative component as indexed by the 
number of intrusion errors. 

According to Underwood (1961, p. 215), 
an analogous effect, could occur when stim- 
ulus similarity is manipulated in pairs of 
stimuli, as has been done in several experi- 
ments including the present one. Because 
of the constraints imposed by natural 
language materials (grapheme-phoneme 
pairs), it was only possible in the present 
experiment to manipulate similarity be- 
tween pairs of items in the list rather than 
across the list as a whole. Underwood (1961, 
p. 215) concludes that all stimulus items in 
the list should be similar to each other in 
order to obtain a stimulus similarity effect. 

From a practical point of view, it ap- 
pears that including two pairs of high- 
similarity grapheme-phoneme pairs in the 
same instructional block will have no sig- 
nificantly detrimental effect on learning. 
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BIASING EFFECTS OF DIAGNOSTIC LABELS AND SEX OF 


PUPIL ON TEACHERS’ VIEWS OF PUPILS’ 
MENTAL HEALTH’ 


PHYLLIS F. HERSON? 
University of Maryland 


One hundred and eighty teachers were asked to respond to a Thurs- 
tone-type scale measuring their view of the degree of psychological 
incapacitation manifested by four hypothetical pupil cases. Varia- 
tions in methods of describing pupils assigned randomly to the sub- 
jects were (a) diagnostic labels only, (b) behavioral descriptions only, 
and (c) behavioral descriptions with diagnostic labels affixed. In 
addition, designations in half of the cases in each variation were 
changed from masculine to feminine. A 4 X 3 X 2 analysis for repeated 
measures indicated that when diagnostic labels were used, mean 
scores on the incapacitation scale were significantly higher than 
when descriptions alone were used (p < .005). No significant overall 


Currently used mental health terminology 
has come to have great social, moral, judicial, 
and even political implications in our society, 
directly affecting decisions in such areas as 
hospitalization, imprisonment, and educa- 
tional and vocational advancement and 
more subtly influencing a wide array of 
social interactions. The focus of this study 
is upon the implications of the use of such 
terminology in a school setting. 

While the importance of the classification 
or labeling process has been given recogni- 
tion in several theoretical propositions, the 
most formalized one being that of Scheff 
(1966), empirical support for this phenome- 
non must be derived from peripheral areas 
such as psychological studies of expectancy 
effects in laboratory experiments (for ex- 
ample, see Cordaro & Ison, 1963). Few 
empirical studies were found that attempted 
to assess the effects of the use of mental 
health type labels in such natural settings 
as the school. The effects of such labels 


1 This article is based on the author’s disserta- 
tion submitted in partial fulfillment of the require- 
ments for the doctoral degree at the University of 
Maryland. 

2 Requests for reprints may be sent to Phyllis 
F. Herson, 9104 Walden Road, Silver Spring, 
Maryland 20901. 


difference in mean scores was found between the two pupil sexes. 


upon the attitudes and policies of teachers 
however, would seem to be of particular 
concern because of the relative youth and 
malleability of the people with whom they 
interact on a day-to-day basis. Teachers, 
furthermore, are in a crucially pivotal 
position with regard to labeling in that they 
are both consumers of and, in some instances, 
originators of labels through the media of 
cumulative records, case conferences, and 
informal communications with other staff 
members. 

Some evidence, most notably from the 
Rosenthal and Jacobson study (1968), was 
found to suggest that labels denoting aca- 
demic potential may become self-fulfilling 
prophecies because of subtle changes, which 
result from the use of labels, in teachers’ 
interactions with pupils. No study was 
found, however, that attempted to assess 
such changes in teachers’ views of pupils as 
a result of the use of mental health type 
labels. 

An examination of the literature dealing 
with the semantics of mental health ter- 
minology, however, reveals that such labels 
may carry negative or misleading implica- 
tions. The replacement of the demonological 
model of mental illness with the medical 
model has, it would seem, not fully eradi- 
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eated older associations of mental illness 
with wild, bizarre behavior (Nunnally, 
1961) and has, in addition, brought about 
some misleading implications of its own. 
Among these are a view of the course of 
mental illness as a rigidly predetermined 
process and an emphasis upon causal agents 
within the individual as opposed to those 
in the social environment (Sarbin, 1967; 
Szasz, 1960). Currently used diagnostic 
labels, it has been further pointed out, are 
too broad, which results in a tendency to 
attribute all of the behavior associated with 
& category to an individual, even though 
only part of the behavior led to the original 
labeling (Ullmann & Krasner, 1969). 

It was further established in the literature 
that teachers today, to a greater extent than 
was formerly true, tend to be sensitive and 
aware of mental health phenomena and 
to look to mental health professionals as 
authorities and as sources of help (Bentz, 
Edgerton, & Miller, 1969). Though this is 
generally regarded with favor as indicative 
of greater enlightenment than was formerly 
the case, nevertheless, it is suggested that 
this heightened sensitivity and awareness 
could have the effect of rendering teachers 
more highly suggestible with regard to 
mental health labels than were their seem- 
ingly more skeptical forebears. 

The literature further suggests that the 
effects of verbal expectancy statements or 
labels would seem to be mitigated by a 
variety of observer characteristics such as 
sex (Stukat, 1958) and degree of training, 
experience, or familiarity with the phenome- 
non under observation (Ingraham & Har- 
rington, 1966). In addition, evidence sug- 
gests that certain pupil characteristics such 
as sex may influence teachers’ attitudes 
and policies toward deviant behavior 
(Yamamoto & Dizney, 1967). 

The purposes of this study then were (a) 
to determine whether the use of mental 
health type diagnostic labels has a biasing 
effect, upon teachers’ assessments of the 
degree of ineapacitation of hypothetical 
pupils, (b) to determine whether teachers’ 
assessments of the degree of incapacitation 
differ as a function of the sex of the pupil, 
and (c) to determine whether sex of the 
teacher or length of teaching experience are 
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mediating variables with regard to 
effects. 


METHOD 


Subjects 


Subjects were 64 male and 116 female elemei 
tary and secondary school teachers enrolled in 
summer classes at the University of Maryland) 
Length of teaching experience ranged 0-36 years) 


Procedure 


From the population of 180, samples of 30 we 
randomly assigned to six treatment groups or si 
variations in stimulus material. Each responden 
was presented with descriptions of four types 0 
hypothetical pupils or cases—marginally retarded 
depressed neurotic, emotionally disturbed, 
paranoid schizophrenic—in one of six treatmen 
formats. Pupils or cases were described by meal 
of (a) diagnostic labels only, (b) behavioral d 
scriptions only, and (c) behavioral description 
with diagnostic labels affixed. Each of thes 
methods of description was further modified b 
changing appropriate designations and pronoun 
from masculine to feminine. For each of the fou 
cases, subjects were asked to respond to & Thur 
stone-type scale developed by the investigator 4 
measure their view of the degree of psychologic 
incapacitation manifested by the hypothetie 
pupil. 


Materials 


Case descriptions and labels. The development 
and validation of the four mental health labe 
and matching case descriptions was accomplishet 
with the assistance of a group of advanced-levé 
interns in a program to train school psychologist 
(n = 12). Half of the trainees (n = 6) were “id 
domly selected, given a group of mental hea 
type labels taken from formal psychologie 
nosologies or from common use, and asked to H 
behavioral manifestations or characteristics tà 
would lead them to employ such a label. Behavior 
and characteristics recurring most frequens 
were then incorporated into case descriptions 
the investigator, and the other half of the gron 
was asked to match the description to the EU 
label. For cross-validation purposes the 58 1 
procedure was then repeated with the Orge. 
group. The correct label was chosen unanimoU? 
in both groups for each of the four labels a 
matching case descriptions that were ultimate 
Selected for use in the study: ho 

1. Marginally retarded: This elementary 847 
boy doesn't seem to have advanced as far 8$ 
others in the class in his school work. He ust 
takes somewhat longer than the others to 17 

is assignments. He seems to have partiou y 
difficulty with tasks like word problems in ard 
metic, and his vocabulary is not as large 85 ™ 
of the other children’s in the class. 


BIASING EFFECTS OF DIAGNOSTIC LABELS 


2. Depressed neurotic: This junior high school 
boy is of average intelligence and usually com- 
pletes his assignments satisfactorily. He is very 
quiet, however, and doesn't speak much in class 
or to his classmates. He frequently seems de- 
pressed and is often dissatisfied and discouraged 
about his own performance. Several of his teachers 
have occasionally noticed him crying. Even 
though his teachers seem satisfied with his work, 
he seems to always blame himself for not doing 
better. He seems to worry a great deal about his 
future, which he feels doesn't look very bright. 

3. Emotionally disturbed: This elementary 
school boy frequently doesn't follow directions. 
He seems to have difficulty concentrating on his 
work. He occasionally will burst into laughter or 
tears or become very angry or upset for no appar- 
ent reason. He doesn't get along very well with 
the other children in the class. 

4. Paranoid schizophrenic: This high school 
boy has great difficulty concentrating on his work. 
He is very suspicious of his teachers and the 
others in his classes. He stays to himself much of 
the time. When spoken to he doesn’t seem to make 
sense at all, He states, for instance, that one of 
his teachers has put a mechanical device in his 
head that is controlling him. At times he has 
violent outbursts directed at one of his teachers 
or the other pupils. 

Degree of incapacitation scale. In accordance 
with procedures presented by Edwards (1957), a 
modified Thurstone equal-appearing-interval scale 
was developed by the investigator to measure the 
subjects’ assessments of the degree of psycho- 
logical incapacitation manifested by the four 
hypothetical pupils. The 18 agree-disagree type 
statements that comprised the scale had been 
ranked by a group of expert judges, 11 counselors 
at the University of Maryland Counseling Center, 
along a 7-point scale representing a continuum 
from an attitude or view of a pupil as minimally 
incapacitated to an attitude of great incapacita- 
tion. As delineated by the investigator, a pupil 
would be considered incapacitated to the extent 
that (a) his behavior was regarded as beyond the 
normal range or ''abnormal," (b) his prognosis 
was poor, (c) his condition was regarded as having 
a predetermined course and was embedded, 
chronic, and deep-rooted, (d) he was in need of 
specialist-type help, and (e) he had adverse effects 
upon his peers. 

The item with the lowest score weight (1.42) 
was “It isn’t very unusual for pupils of this age 


to act like this." The highest score weight (7.00) 


was assigned to the item, “A pupil like this should 
be institutionalized." An example of an item 
receiving an intermediate score weight (3.25) was 
“A pupil like this probably needs help through a 
diffieult period." Interquartile ranges for the 
judges rankings for the 18 items ranged from .00 
to 1.64. 

A pilot study conducted with graduate students 
in a class to train remedial reading instructors 
(n = 19) yielded a reliability coefficient for the 
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degree of incapacitation scale of .88 obtained by 
the test-retest method with a one-week interval. 


Statistical Analysis 


Data consisted of four scores for each subject 
on the degree of incapacitation scale—one for 
each of the four cases presented. Each of these 
scores was computed by taking the arithmetic 
mean of the scale values of the statements with 
which the subject agreed (Edwards, 1957). 

Data were analyzed by means of an analysis 
of variance procedure employing a 4 X 3 X 2 
factorial design with repeated measures in one 
dimension, that of the four cases. Computerized 
analysis was employed (Dayton, 1971). 


RzsurrS 


Significant differences were found (p « 
.001) among mean scores on the degree of 
incapacitation scale for the three types of 
stimulus materials used—labels only, de- 
scriptions only, and descriptions with labels 
affixed (F = 15.95, df = 2). A post hoc 
analysis utilizing the Scheffé procedure 
indicated that the mean of the two condi- 
tions that employed labels combined was 
significantly higher than was the mean for 
the description-alone condition. No signifi- 
cant difference was found between the two 
conditions that utilized labels. 

Although a significant F ratio was found 
for interaction effects between type of 
stimulus material and sex of pupil (F — 
3.91, df = 2, p « .05), these differences 
were not located with the Scheffé test. This 
seeming discrepancy is probably attribut- 
able to the highly conservative nature of this 
procedure as contrasted to the analysis of 
variance test. 

It was assumed at the outset that the four 
cases that were presented represented differ- 
ing degrees of severity or incapacitation, and 
no hypotheses were formulated with regard 
to these differences. The fact that significant 
differences did, in fact, occur among these 
"known groups” (F = 172.12, conservative 
df = 1, p < .001) may, however, be cited 
as evidence in support of the validity of 
the measuring instrument. (Conservative 
degrees of freedom were employed because 
of failure to meet the homogeneity of 
covariance assumption.) 

Significant interaction effects also oc- 
curred between type of stimulus material 
and the four cases (F = 8.99, df = 2,p < 
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.01). It was indicated in the post hoc analysis 
that the main treatment effects of type of 
stimulus material were accounted for by 
only two of the four cases, the marginally 
retarded and depressed neurotic. Main 
treatment effects were not significantly 
different for the emotionally disturbed and 
paranoid schizophrenic. 

No significant overall difference was found 
between the means for the two pupil sexes, 
and there was no significant interaction 
effect between sex of pupil and the four 
cases. : 

Correlation coefficients between sex and 
length of teaching experience of the subjects 
and scores for the treatment variations were 
of low magnitude and not statistically 
significant. 


Discussion 


Effects of Labels 


It should be observed that not only were 
the combined mean scores for the two condi- 
tions that utilized labels significantly higher 
than were those for the description-only 
condition, but that, when taken alone, the 
description-with-label-affixed mean was sig- 
nificantly higher than that of the descrip- 
tion-alone condition. This finding is of 
partieular interest when considered in the 
context of several theoretical positions that 
were reviewed. Ellis (1967) has suggested 
that the negative connotations of labels 
might be reduced if the labels were accom- 
panied by operational definitions. If it can 
be assumed that the behavioral description 
that followed the label was, in effect, an 
operational definition, then Ellis' contention 
would not be supported by this study. 

These findings, on the other hand, can 
be reconciled with several other theoretical 
positions. Scheff’s (1966) proposition that 
it is the label that serves to stabilize what 
may be only a transitory condition might 
suggest that the label in this study added a 
connotation of stability, whereas without 
the label the condition might have been 
regarded as transitory or developmental in 
nature. Merton’s (1957) contention that 
once a meaning has been ascribed to an 
event, it is interpreted in the light of this 
ascribed meaning would also seem applica- 
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ble. The label in the description-with-label 
affixed condition could assume its poteng 
by ascribing meaning to the described be 
havior. It is finally suggested that 
labeling effects might be interwoven, per 
haps inextricably, with the stigmatizin 
effects of having sought help for mentä 
health problems, a phenomenon that i 
supported in the literature (Phillips, 1963 
and by recent political example. Attempt 


professional source, such as a psychiatri 
versus a teacher. 

The significant interaction effects 
tween the type of stimulus material a 
cases indicated that the labeling effects th 
were observed did not occur consistenti 
across the four cases. Post hoc finding 
revealed significant differences in stimuli 
material effects for the depressed neuroli 
and marginally retarded cases, it is recallet 
but not for the emotionally disturbed an 
paranoid schizophrenic cases. These resul 
afford some possibility for conjecture abot 
the qualities of labels that would rendé 
them more or less likely to produce 
observed effects. One such quality might b 
the severity of the disorder described. 
paranoid schizophrenic, for instance, 18, 
course, a more seriously disturbed individu 
than the depressed neurotic. Furthermo 
an examination of the accompanying 
scription of the paranoid schizophreni 
reveals that even for this very serious CM. 


order the symptoms described are exc 


ingly bizarre. The pupil is said, for instant? 
to make claims that the teacher has pui 


of the marginally retarded and the depress? 
neurotic do not seem to present a very, cle 
departure from normal. The margins? 
retarded child, for instance, is described 8 
taking longer to finish his tasks. It might Jj 
hypothesized then that for less serious @ 
orders in which the symptoms do not dep?! 
greatly from what is regarded as nor 
behavior, it is the label that serves to 9 
ganize the somewhat ambiguous behav! 
into the abnormal category. It is the labd 


in other words, that ascribes an abnormal 
meaning to the symptoms. In instances in 
which the symptoms themselves show great 
deviance from what is regarded as normal 
or are perhaps the most severe of the symp- 
toms associated with a given diagnostic 
category, the affixing of the label seems to 
have little effect. It remains, of course, for 
subsequent researchers to verify these ob- 
servations with a larger and more sys- 
tematically selected sampling of labels. 


Sex Differences 


No overall difference was observed be- 
tween the mean scores for subjects given 
cases designated as boys and those given 
cases designated as girls. Significant in- 
teraction effects between type of stimulus 
material and sex of pupil, however, were ob- 
served. Further scrutiny revealed that no 
overall difference might be accounted for by 
the fact that for girls the scores under the 
behavioral-description-alone condition were 
lower and under the labeling conditions 
somewhat higher than they were for boys. 
These differences, then, would tend to cancel 
each other out in the overall analysis for sex. 

Although the significant interaction ef- 
fects between sex of pupil and type of 
stimulus material were not located by the 
Scheffé procedure, a trend was observed in 
the direction of greater labeling effects for 
girls. This trend was accounted for not so 
uch because of large differences between 
oys and girls under the labeled conditions, 
ut because under the behavioral-descrip- 
ion-alone condition the girls were viewed as 
ess ineapacitated than were the boys. The 
rend in this direction would be consistent 
ith the findings of Gurin, Veroff, and Feld 
1960), who concluded that in our society 
here is greater tolerance of deviant be- 
avior among females than among males. It 
ight also be cited as consistent with the 
recent findings of Broverman, Broverman, 
Clarkson, Rosenkrantz, and Vogel (1971) 
that behavior that is regarded as “sick” for 
people in general is regarded as normal for 
females. This trend, if verified, would then 
seem of interest to those investigating the 
role of the schools in perpetuating sex role 
Stereotypes. 


BIASING EFFECTS OF DIAGNOSTIC LABELS 
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It would not seem justifiable on the basis 
of one limited study to recommend that the 
use of current mental health nosologies be 
abandoned in the schools. Moreover, 
teachers and mental health practitioners 
may justify the claim that the communica- 
tive value of such labels overrides the biasing 
consideration. A recommendation that is 
seen as appropriate, however, is that the de- 
velopment of greater sensitivity and aware- 
ness of the potentially biasing effects of 
labels be incorporated into the preparation 
and in-service training of teachers and men- 
tal health professionals in the schools. An 
obvious alternative to the indiscriminate use 
of labels,-and one that efforts already are 
being made to implement, is greater use of 
observations, descriptions, and recommen- 
dations for action at the behavioral level. 
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PERSONALITY TRAITS ASSOCIATED WITH EFFECTIVE 
TEACHING IN RURAL AND URBAN SECONDARY SCHOOLS? 


KENNETH D. MATTSSON? 
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The Cattell Sixteen Personality Factor Questionnaire was adminis- 
tered to 73 secondary students at the start of their student teaching. 
Near the end of the quarter their teaching effectiveness was assessed 
by administering the Hoyt-Grim Pupil Reaction Inventory to their 
pupils. Different patterns of personality traits associated with teach- 
ing effectiveness were found when subjects were grouped by level, by 
major field, and most distinctly, by size of community. Teaching 
effectiveness correlated in medium cities with Outgoing (A+), Emo- 
tionally Mature (C+), Trusting (L—), Confident (O—), Group De- 
pendent (Qo—), Relaxed (Q.—), and Low Anxiety (I—) factors and 
in small towns with Sober (F—), Shy (H—), Sensitive (I+), Trusting 
(L—), and Introverted (II—) factors. On eight of the above factors 
there were direct and almost linear relationships between size of 
community and strength of the correlation with teaching effective- 


ness. 


It would be pointless to dust off the his- 
torical record of man’s interest in teachers’ 
personality traits. But it would be useful to 
look at a few points of reference (and of 
curiosity) along the path over which we re- 
cently have come. 

Seventy years ago a variety of what we 
now see as “quaint charaeterizations" were 
still being made about the teachers of 
children, characterizations such as the fol- 
lowing written in 1902: 


Such constant, unrelieved association with ju- 
venile life has its narrowing effect upon the mind 
of the master. After a time he loses his sense of 
perspective, attaches undue importance to trifles, 
and is disposed to judge of the world at large by 
the ethical and other standards of the schoolroom 
[Adams, p. 34]. 


Witty, a quarter of a century ago, ob- 
served that far too many teachers had de- 
veloped an ideal of self that emphasized 


1 This study is based on the author's doctoral 
dissertation submitted to the University of Min- 
nesota. 

Appreciation is extended to Gordon M. A. 
Mork for his guidance in the planning and execu- 
tion of the study. 

2 Requests for reprints should be sent to Ken- 
neth D. Mattsson, who is now at the School of 
Edueation, Mankato State College, Mankato, 
Minnesota 56001. 


self-denial, abstinence, and deprivation. 
Not only could this result in the denial of 
many normal satisfactions and appetites, 
but also, “Such a personality tends to alien- 
ate children and young people [1947, p. 
669]." 

At the same time that Witty was making 
this observation in the name of the mental 
hygiene movement, Symonds (1947) was 
summarizing research in a pessimistic way, 
“there is no pattern of personality that will 
make the best teacher [p. 653].”” 

Not surprisingly, the difficulties experi- 
enced in the 1940s and 1950s did not ex- 
tinguish research efforts. If anything, the 
study of teachers’ personality traits has been 
revitalized by the development of new as- 
sessment instruments, and it now attracts 
the interest of an ever widening group of 
educational researchers. For example, Sand- 
ven (1969), at the University of Oslo, con- 
cluded from his study of 931 Norwegian 
teacher trainees, “It appears that social co- 
reaction? represents a personality factor 
which in a constructive way has relevance 
to the role of teaching [p. 136]." 


3 Sandven uses the term social coreaction to 
mean the ability and tendency to react em- 
pathically and with sympathy. 


123 


124 


Interest in teacher personality traits is 
still very much alive. The new concern for 
improved human relations within our 
schools is justification enough for greater 
effort to understand the personal qualities 
of those who are being called upon to effect 
these improvements. 

The present study was an attempt to 
discover relationships between teacher per- 
sonality traits as measured by the Sixteen 
Personality Factor Questionnaire (Sixteen 
PF), and success in classroom teaching as 
measured by the Hoyt-Grim Pupil Reac- 
tion Inventory (PRI). Since the available 
literature offered inconsistent and often con- 
tradictory findings, there was not sufficient 
basis for hypothesizing a specific set of per- 
sonality traits associated with effective 
teaching. The hypothesis tested in this 
study was, therefore, a general one aimed at 
discovering, rather than verifying, patterns 
of traits and was stated as follows: There is 
no relationship between scores attained by 
Student teachers on the Sixteen PF and 
Scores attained on the PRI. This hypothesis 
was tested for student teachers grouped by 
grade level, major field, and size of com- 
munity. 


METHOD 
Subjects 


The subjects of this study were 73 Mankato 
State secondary school student teachers (49 men 
and 24 women) in the Subject matter areas of 
language arts (23), mathematics (12), science (13), 
and social studies (26), who were assigned to full- 
time off-campus student teaching in 35 cooperat- 
ing schools in southern Minnesota. 


Instruments 


, The Sixteen PF, developed by Cattell, is an 
inventory of 


real, functionally unitary, and chologicall: 

Meaningful dimensions of Samoan pes 
upon factor analysis of raw data obtained from 
behavior ratings, objective personality tests and 
Inventories, as well as from clinical and social 
reaction patterns as observed in a wide range of 


structured and unstructured situati 
1957, wont ed situations [Cattell, 


Henjum (1967) correlated scores on the Sixteei 
PF with the PRI and found that eed 
junior high Student teachers were extraverted 
well -adjusted," warm, friendly, and partici- 
pating, while successful senior high student 
teachers were intelligent, enthusiastie, and prac- 
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tical. Other studies of teacher personality tr 
that used the Sixteen PF include Mitchell (19 
Tarpey (1965), Morrison and Romoser (l9 
Warburton, Butcher, and Forrest (1963), Isaag 
McKeachie, and Milholland (1963), Burd 
(1963), and Kimmel (1964). An analysis of th 
studies, plus an additional five reviewed by H 
jum (1967), yields no consistent pattern of j 
sonality traits associated with teaching effect 
ness. Part of the reason for a lack of consiste 
could be the failure of many researchers to rej 
their subjects’ teaching fields, teaching levels, 
sex. But, more importantly, none of the rese! 
ers attempted to measure the very probable el 
of the different cultural environments associa 
with rural communities on the one hand and lay 
more urban communities on the other. 

One very interesting study by Kerkman (ll 
did relate size of the community with pupili 
haviors. He found that in large-city schools 
pils spent more time attending to the teacher 
there was more organized group work and n 


total teacher-pupil interaction, while cl 

in small town schools were more heavily weigh 
on autonomous action of pupils, heterogeneitj 
patterns, and the engaging of the teacher by 
pupil. 


In his review of research, Ryans (1960) 9 
cluded that sex of teacher and size of commu 
were related to teacher effectiveness. The pre 
study was especially concerned with the lal 
factor. 5 

The PRI is an inventory of pupil respol 
(agree, disagree, no opinion) to 200 statem 
similar to those which pupils make about ti 
teachers and classroom experiences (e.g., ^T 
student teacher often doesn't seem to know 
here." “We work in class, but we have fun to 
The authors of the PRI found that it would yl 
Scores independent of classifications by scho 
grade-level, and subject matter field and thi 
was ''a source of unique information about teat 
behavior as manifested in its effect upon psy% 
logical phenomena within pupils [Grim, Hoyt 
Mayo, 1954, p. 84]." 

The report by Henjum found the PRI to B 
Satisfactory measure of classroom teaching e 
tiveness and offers additional background in 
mation on the instrument (1967, p. 56). 


Procedure 


Forms A and B of the Sixteen PF were admit 
tered to the student teachers before they left 
campus. The measure of classroom teaching & 
tiveness, the PRI, was administered by the a 
vising teachers to the pupils (about 3,500) o! 
Student teachers during the last week of he. 
Each subject was asked to name the best and 
worst of his three classes and then the supervi 
teacher was requested to administer the PRE 
those two classes. 4 

Product-moment correlation coefficients 
computed between scores on the Sixteen PF 8 
mean scores on the PRI for subjects grouped 
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cording to teaching level, subject matter area, and 
size of community. 


RESULTS 


Scores on 20 personality factors (16 pri- 
mary and 4 secondary factors of the Sixteen 
PF) were correlated with scores on the PRI 
for 73 junior and senior high school student 
teachers. In the following discussion of cor- 
relations a one- or two-word description of a 
Sixteen PF factor is followed by the factor 
letter, and since each factor is bipolar, a 
plus or minus sign is included. 

As can be seen in Table 1, classroom ef- 
fectiveness correlated significantly for junior 
high teachers with Conscientious (G+), 
Group-Dependent (Q:—), and Relaxed 
(Qı—) factors and for senior high teachers 
with the Group Dependent (Q;— ) factor. 

When the subjects were grouped accord- 
ing to subject matter area, it was found 
that effectiveness in teaching language arts 
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(English and speech) correlated significantly 
with the following factors of the Sixteen PF: 
Shy (H—), Trusting (L—), Relaxed (Qi—), 
and Introverted (II—). One significant cor- 
relation each was found for math and 
science teachers, but with 20 correlations 
involved this could have happened by 
chance. Effectiveness in social studies cor- 
related (p < .01) with the Group-Dependent 
(Q2—) factor. 

The findings for teaching level and sub- 
ject matter area were strong enough to sug- 
gest recognizable patterns of personality 
traits that were associated with effective 
teaching in junior high and language arts 
but not in senior high nor in math, science, 
or social studies. 

Much stronger patterns were found when 
the subjects were grouped according to the 
size of the community in which they taught. 
The size-of-community categories were (a) 


TABLE 1 
CORRELATIONS BETWEEN STUDENT TEACHER SIXTEEN PERSONALITY Factor (Pr) SCORES 
AND Mean Puri, Scores on THE Hoyt-Grim Puri. Reaction INVENTORY 
BY TEACHING LEVEL AND Mayor FIELD 


: ; Junior | Senior | Language 1 Social 
CET sae | Js ae nun Qe, | rers | sim 
Primary factor 

Reserved-Outgoing A .00 | —.04 | —.09 42 | —.19 | —.08 
Dull-Intelligent B —.10 .20 .05 | —.20 .42 18 
Emotionally Immature-Mature Cc .08 | —.09 .25 .40 11 | —.14 
Submissive-Dominant E —.06 | —.18 | —.87 .24 | —,28 | —.18 
Sober-Enthusiastic F —.17 | —.07 | —.26 32 | —.33 | —.02 
Expedient-Conscientious G .40* | —.04 10 | —.23 .04 19 
Shy-Venturesome H |-—.16 | —.09 | —.42* .20 | —.20 | —.08 
"Tough Minded-Sensitive I 15 13 .25 | —.88 .60* 14 
"Trusting-Suspicious L | —.09 | —.06 | —.43* | —.08 .20 | —.01 
Conventional-Imaginative M .04 | —.18 .299 | —.13 | —.19 | —.14 
Forthright-Shrewd N .00 | —.03 | —.03 | —.02 .00 | —.27 
Confident-Apprehensive O |—.15 | —.20 | —.21 | —.66* 1178 
Conservative-Experimenting Q J1 | —.11 | —.07 | —.09 | —.09 .08 

Group Dependent-Self-sufficient Q: | —.38* | —.36* | —.09 | —.34 | —.45 | —.49** 
Casual-Controlled Q: .20 .06 .83 | —.00 .27 | —.20 
Relaxed-Tense Qs | —.87* | —.13 | —.44* | —.32 | —.17 | —.04 

Second-order factor 

Low Anxiety-High Anxiety I —.21 | —.09 | —.29 | —.42 | —.09 | —.01 
Introverted-Extraverted II | —.13 | —.10 | —.48* .4 | —.29 | —.01 
Responsive-Tough Poise Ill | —.14 | —.10 | —.24 .22 | —.35 | —.18 
Dependent-Independent IV | —.16 | —.24 | 2.13 | —.15 | —.22 | —.25 


*» < .05. 
** p < 01. 
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small town—population of fewer than 2,000; 
(b) small city—population of 2,000-7,499; 
(c) medium city—population of 7,500—30,000; 
and (d) metropolitan—(Minneapolis) and 
some suburban areas. 

'The correlations listed in Table 2 show 
distinct patterns of personality traits associ- 
ated with effective teaching in medium cities 
and small towns but not in metropolitan or 
small-city schools. 

In small towns, teaching effectiveness cor- 
related significantly with Sober (F—), Shy 
(H—), Sensitive (I+), Trusting (L—), and 
Introverted (II—) factors. Teaching ef- 
fectiveness in medium cities correlated sig- 
nificantly with the following factors: Out- 
going (A+), Emotionally Mature (C+), 
Trusting (L—), Confident (O—), Group 
Dependent (Q:—), Relaxed (Q.—), and Low 
Anxiety (I—). On 8 of these 11 different fac- 
tors the correlations were in opposite di- 
rections for the two groups of teachers. 

As is made more apparent in Figure 1, the 
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correlation coefficients on five of the factors 
(A, C, F, H, and Second-Order II) were 
strongly negative for the small town teachers 
and positive for the medium city teachers 
with the correlations for the small city 
teachers generally falling midway between 
those extremes. The direction of this rela- 
tionship across the size-of-community di- 
mension was just the opposite for Factors 
O, Qs, and Second-Order I but even more 
pronounced in its nearly straight-line pro- 
gression from the small town of fewer than 
2,000 population to the medium city of 
77,500—30,000. 


Discussion 


When the subjects of this study were 
grouped by teaching level and by subject 
matter area, the findings were minimally 
sufficient for the rejection of the null hy- 
pothesis. But when teachers in rural com- 
munities were compared with teachers in 
larger communities, the findings not only 


TABLE 2 
CORRELATIONS BETWEEN STUDENT-TEACHER SIXTEEN PERSONALITY Factor (PF) Scores AND MEAN 
Puri Scores on THE Hoyt-Grim PUPIL Reaction INVENTORY By Size or SCHOOL COMMUNITY 


vow sore C ar sane Ch) sale | Sey” | ey | Mem CEN 
Primary Factor 
Reserved-Outgoing A — .45 — .05 .53* —.25 
Dull-Intelligent B 18 —.06 .05 .26 
Emotionally Immature-Mature Cc — 39 —.84 .07** .03 
Submissive-Dominant E —.19 .12 —.16 —.42 
Sober-Enthusiastic F —.52* .01 .19 —.10 
Expedient-Conscientious G —.18 .24 .24 .38 
Shy-Venturesome H — .62** 14 42 —.08 
Tough Minded-Sensitive I .60** 14 21 —.11 
TTrusting-Suspicious L —.47* .03 —.47* ES 
Conventional-Imaginative M —.08 —.07 — .08 .05 
Forthright-Shrewd N —.22 E 45 —.09 
Confident-Apprehensive (0) 23 — 124 — .58* Wer 
Conservative-Experimenting Qı —.04 —.15 —.05 .03 
Group dependent-Self-sufficient Q: —.05 — .50** 47* — .36 
Casual-Controlled Qi —.08 m 10 “39 "32 
Relaxed-Tense Qa .03 —.32 —.70** ‘01 
Second-order factor 

Low Anxiety-High Anxiety I : = —.75** = 

Introverted-Extraverted II - p i HE =; sn 
Responsive-Tough Poise III —.40 —.1 - 19 ‘05 
Dependent-Independent IV .00 —.16 —.41 — 125 


*p < .05. 
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Ficure 1. Relationships between selected correlation coefficients (Sixteen Personality 


Factor Questionnaire versus teaching effective: 


necessitated the rejection of the null hy- 
pothesis but also gave substantial evidence 
that size of community was a more powerful 
factor than were teaching level or subject 
matter area in the relationship between suc- 
cessful teaching and personality traits of the 
teacher. 

A careful analysis of the results reveals a 
distinct pattern of traits that appears to be 
related to successful teaching in the medium 
city and an equally strong but nearly oppo- 
site pattern related to successful teaching in 
the small town school whose student body is 
comprised largely of farm children. 

For communities similar to those that 
hosted the student teachers used as the sub- 
jects of this study it can be inferred that the 
personality traits of successful beginning 
teachers in secondary academic areas range 
over a continuum from general introversion 
at one end to extraversion at the other. 
Starting with the effective student teacher 


ness) and size of community. 


in the small town who tends to be reserved, 
shy, sensitive, trusting, and introverted, the 
continuum encompasses the group-depen- 
dent small city teacher and terminates at the 
opposite end with the teacher in the medium 
city who tends to be more outgoing, mature, 
trusting, confident, relaxed, unanxious, and 
possibly extraverted. However, apparent 
contradictions in the small town and medium 
city patterns do appear in the findings. Why 
do these disparate types of teachers share 
the same value on Factor L, signifying that 
they are trusting, adaptable, free of jealousy, 
and easy to get along with? Any of the rela- 
tionships could, of course, have occurred by 
chance, but did both groups appear to be 
trusting because the opposite value on L 
(indicating a tendency to be suspicious, 
self-opinionated, and hard to fool) is so 
foreign to effective teachers that trusting is 
the only possible result? And if the effective 
teacher in the medium city tends to be 
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group dependent (Q:—, needs social and 
group support), why does he also tend to be 
shrewd (N+, calculating, sophisticated, 
hardheaded)? 

The first explanation to come to mind is 
that the success criterion, the PRI, suffers 
from a major fault in that it makes the cal- 
culating cynic (adaptable and sociable for 
the sake of exploiting the group) look good in 
the classroom. But the PRI did not produce 
that same puzzling L—, N+, Q.— pattern 
for any other group of subjects, only for the 
medium city teachers. It is hoped that 
further research designed to test these size- 
of-community patterns will help to clarify 
these apparent contradictions. 

Nevertheless, the significance of the cor- 
relations, the sharp contrasts, the continu- 
ous pattern, and the agreement with com- 
mon sociological information suggest that 
the cultural milieu of the secondary school 
in southwestern Minnesota has a powerful 
influence upon pupil perceptions of class- 
room experiences. 

The personality pattern of the effective 
teacher in the metropolitan (largely sub- 
urban) area does not fit into the continuum, 
supposedly because recent geographic mo- 
bility has resulted in a mixture of people 
from large cities, small cities, and rural en- 
vironments settling in the suburbs. The 
children of these people bring with them 
conflicting perceptions about teacher-pupil 
interactions in the classroom. As a corollary 
to this hypothesis, it can be expected that, 
in time, a new consensus of perceptions and 
expectations about teachers will synthesize 
in these areas of rapid change. 

The findings of this study strongly sug- 
gest that certain cultural variables associ- 
ated with size of community are strong de- 
terminants of the particular patterns of 
teachers’ personality traits that are related 
to classroom effectiveness. The clear impli- 
cation is that, similarly, there may be associ- 
ated with inner-city neighborhoods certain 
cultural variables that operate, in a sense, to 
require particular patterns of personality 
traits for successful teaching or, at least, 
for the avoidance of failure. 

In spite of the fact that research is gener- 
ally more difficult to execute in the inner- 
city school than in the small town school, 
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further research on teacher personality 
traits should be concentrated in the inner 
city because these communities are cultur- 
ally unique and because the current needs of 
inner-city schools simply outweigh those of 
the suburban and smaller-city schools. 
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SEX DIFFERENCES IN THE ORGANIZATIONAL ASSIMILATION 
OF BEGINNING GRADUATE STUDENTS IN PSYCHOLOGY? 


JOHN E. NEWMAN: 


University of Illinois 


Sex differences in organizational assimilation were examined in a 
longitudinal study of beginning graduate students in psychology. The 
results of the study indicated that the female graduate students were 
significantly less satisfied and assimilated and that they experienced 
significantly greater role ambiguity and role demands than the males. 
There were no significant differences in role performance. These re- 
sults were discussed in terms of a process that differentially affects 
the role making and role adjustment of male and female graduate 
students. This information has implications for the assimilation 
policies of graduate departments and perhaps for any organization 
interested in the assimilation of its newcomers. 


The purpose of this study was to see if the 
role-making process differs for male and fe- 
male graduate students. Are there sex dif- 
ferences in such role variables as perception 
of role demands, role ambiguity, role satis- 
faction, role conflict, assimilation, and role 
performance for newcomers to the role of 


. graduate student? Although the answer to 
this particular question is more than inter- 


esting, the importance of this type of re- 
search far exceeds the boundaries of the 
setting utilized in this study. A theory of role 
making and role adjustment is relevant to 
all individuals, in all roles, in all organiza- 
tions. Therefore, the present research was 
undertaken primarily with the hope of con- 
tributing to the development of a general 
theory of role making and role adjustment 
which will serve as a basis for future re- 
search and for practical action. 

The desirability of establishing a general 
theory of role making and role adjustment 


1 The author wishes to express his sincere ap- 
preciation to George B. Graen, Charles L. Hulin, 
Thomas W. Johnson, and J. B. Orris for their in- 
valuable consultation and guidance and to the 
graduate students who generously participated in 
the study. The author would also like to thank 
Patrick R. Laughlin and Jeanne B. Herman for 
their comments on an earlier version of this paper. 

2 Requests for reprints should be sent to John 
E. Newman, who is now at 1 State Farm Plaza, 
State Farm Insurance Company, Bloomington, 
Illinois 61701. 


is clearly evidenced by the currently ex- 
pressed interests in minority group selection, 
training programs, and women’s liberation. 
These interests raise the question of whether 
there are differences in assimilation patterns 
between population subgroups. For example, 
much has been written concerning the 
unique problems that women face in aca- 
demic settings (e.g, Astin, 1972; Kaley, 
1971; Rossi, 1973), yet there is little empirical 
evidence to support any conclusive theory 
on why contemporary women find graduate 
education and professional occupations dif- 
ficult to pursue. The present study focused 
on these problems from an organizational 
assimilation perspective and examined the 
assimilation of male and female newcomers 
in an educational organization. 

What is organizational assimilation? The 
assimilation process was conceived as a role- 
making-role-adjustment process. What is 
the neweomer supposed to do? How does he 
find out? Who or what are the sources for 
role demands? What kind of and how much 
pressure is put on the new organizational 
member to do this or that role activity? How 
does he perceive and respond to these role 
demands? How is the new person's role de- 
fined? How much latitude does the new- 
comer have in defining his own role? What 
are the events and processes of role making 
and role adjustment? 

The concept of assimilation derives its 
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meaning from the interrelationships among 
the set of role variables mentioned at the 
beginning of this article. The dynamic na- 
ture of the variables and their relationships 
make it unlikely that anyone ever becomes 
completely assimilated. (Note that com- 
plete assimilation may mean that one's role 
is totally and finally defined, accepted by 
all concerned, and performed as such with 
complete satisfaction and satisfactoriness.) 
However, it is assumed that one can mea- 
sure the degree to which a newcomer or any 
organizational member feels assimilated. 
Note also that the conceptual framework 
and the methods involved in investigating 
the assimilation process offer an alternative 
approach to the study of organizational 
behavior, especially job satisfaction, job 
performance, and the elusive relationship 
between them. 

Studies of the graduate student's role- 
making process, particularly as perceived 
by the graduate student, have been rare. 
Baird's (1969) article presented a good, 
brief review of the relevant research. Baird's 
own study of the role relations of graduate 
students is most related to the present 
research. Baird attempted an objective 
description of the social relations of graduate 
students. A questionnaire containing items 
describing the graduate student role was 
sent to 1,500 graduate students who returned 
689 usable answer sheets. A factor analysis 
of these data extracted five factors to ac- 
count for the role relations and general 
adaptation of students to graduate school: 
the extent of the student's involvement in 
graduate peer groups, the rigor of academic 
demands, the degree of ambiguity and con- 
flict in professors! demands, the accessibility 
of the faculty, and the degree of tension the 
student experiences from these relations. 

Baird's (1969) study and the present 
study have common conceptual origins in 
the theory of roles in complex organizations 
(Biddle & Thomas, 1966; Kahn, Wolfe, 
Quinn, Snoeck, & Rosenthal, 1964; Katz & 
Kahn, 1966). However, the present study 
goes beyond Baird's study in three im- 
portant ways. First, this analysis provides a 
more dynamic picture. Baird provided a 
factorial description of the graduate stu- 
dent’s role existing in but one time-slice of 
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data. In an effort to examine the dynamic 
nature of role variables, the present lon- 
gitudinal study included three time-slices 
of data and attempted not only to identify 
the factors or variables that enter the as- 
similation process, but also to monitor 
these variables and their interrelationships 
over time. 

A second important addition concerns 
the nature of the focal subjects. Baird (1969) 
used a cross section of graduate students. 
By sampling graduate students at various 
degree stages, one is sampling students who 
have been in the organization for different 
lengths of time and, hence, one might ex- 
pect differences in their extent of assimila- 
tion (or in their adaptation to the depart- 
ment as Baird might say). In essence, one 
might expect such variables relevant to 
role analysis as role ambiguity and role 
satisfaction to be affected by time spent in 
the organization. Therefore, Baird's results 
may be confounded by the differential 
amounts of time spent by his subjects in 
their respective graduate departments. In 
fact, Baird apparently realized this and, 
hence, claimed validity for some of his 
scales which had consistent relations with 
stage of graduate career. What effect time 
had on the other variables and scales is not 
known. Nor is it clear what effect the com- 
bining of data representing students at 
various career stages had on the generality 
of the five factors Baird extracted to ac- 
count for the role relations of graduate 
students. One must also be careful if 
equating stage of career development with 
amount of time in the organization. For 
example, some students may have earned 
their master's degree in one year, other 
students in two or three years. 

The third important difference is that the 
present study investigates sex differences, 
whereas Baird did not. 

Therefore, the present study attempted 
to control and enhance the research design 
by (a) monitoring the role-related var- 
lables over time (time-sampling problem 
minimized), (b) focusing on an entire set 
(subject-sampling problem minimized) of 
first-year students beginning their grad- 
uate education (and their assimilation pro- 
cess) at the same point in time, and (c) ex- 
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amining sex differences in the assimilation 
process. 

There was some basis for expecting sex 
differences in satisfaction. Smith, Kendall, 
and Hulin (1969) reported relatively con- 
sistent sex differences in job satisfaction as 
measured by the Job Description Index, 
with females being less satisfied. They sug- 
gested that such results may be related to 
possible differences in frames of reference 
and alternatives available for males and 
females. Hulin and Smith (1964) suggested 
that sex, per se, may not be the crucial 
factor which leads to high or low satisfac- 
tion. Instead, the entire set of variables 
which consistently covary with sex (e.g. 
pay, job level, promotion opportunities, and 
societal norms) may cause the differences in 
job satisfaction. It is important to note that 
the above findings were based on data ob- 
tained from men and women working in 
industrial settings. 

In light of the previous research, the 
purpose of this study was to measure and 
monitor over time the role variables believed 
to be important in the assimilation process 
and to see if this role-making process 
differs for male and female graduate stu- 
dents. The hypotheses were as follows: 


1. Females feel less satisfied with their 
role of graduate student than males. 

2, Females perceive their role to be more 
ambiguous than males. 

3. Females feel less assimilated than 
males. 

4. Females perceive greater role demands 
than males. 

5. Females experience greater role con- 
flict than males. 

6. Females have lower role performance 
evaluations (i.e., grade point averages) than 
males. 


METHOD 


Research Strategy and Subjects 


It was believed that studying new graduate 
students during their first semester would be most 
appropriate for exploring the role-making and role 
adjustment process. The first few months of gradu- 
ate school is a period when role demands are likely 
to be very salient and the new student's role- 
related attitudes and behaviors most subject to 
measurable change. Therefore, 44 first-semester 


graduate students in the psychology department 
of a large public university participated as the 
focal subjects (14 females, 30 males). There were 
no significant differences between females and 
males in terms of Graduate Record Examination 
scores (mathematics: X = 637 and 661, respec- 
tively; advanced test: X = 603 and 646, respec- 
tively) or in terms of quality of undergraduate 
school (Astin selectivity index: X = 61.4 for each 
group). 


Instruments 


A taxonomy of role activities was developed in 
the initial phase of the study by asking graduate 
students and faculty to what activities did first- 
semester graduate students allocate time and 
energy. Various samples of graduate students 
evaluated the resulting list of 100 role activities 
and a consensus was reached that 32 of the role 
activities were, indeed, valid activities of first- 
semester graduate students. Graduate students 
were then asked to assign to each role activity a 
source of pressure or demand for involvement in 
that activity. This resulted in three scales for the 
role activities: departmental, social (fellow stu- 
dents), and personal. 

The social role activities scale consisted of the 
following items: (a) doing favors for my fellow 
students, (b) helping my fellow students with their 
problems, (c) listening to the complaints of my 
fellow students, (d) becoming an accepted member 
of the staff, (e) keeping good working relations 
with my fellow students, and (f) socializing with 
friends I have met through the department. 

The departmental role activities scale contained 
these items: (a) learning basic methodology and 
how to approach problems, (b) learning what 
amount of work is required, (c) achieving high 
grades, (d) doing busy work, (e) working on 
papers and projects, (f) handling routine prob- 
lems, and (g) learning the academic rules and 
regulations. 

The personal role activities scale included the 
following: (a) planning my time to deal efficiently 
with peak work loads, (b) becoming psycholog- 
ically adjusted to the graduate school routine, (c) 
finding out what courses are really like, (d) at- 
tending classes, (e) improving myself through my 
academic experiences, (f) getting things done in 
spite of the rules, (g) thinking about ways to 
improve my grades, (h) finding personal fulfill- 
ment as a graduate student, (i) getting myself 
going on my studies, (j) becoming master over my 
learning situation, (k) improving the quality of my 
work, (l) thinking about the bad aspects of getting 
an education, (m) making and keeping friendships, 
(n) learning who to go to for help on various 
problems, (0) doing my basic studying, (p) study- 
ing things in which I am really interested, (q) 
doing more than the minimum requirements, (r) 
reading texts and journals not assigned by the 
instructor, and (s) considering a change in dis- 
cipline, speciality, or graduate school. 

The perceived role demands questionnaire as- 
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sessed the magnitude of involvement demanded by 
each of the three sources of pressure (depart- 
mental, fellow students, personal). All role activi- 
ties (ie., all three scales) were listed for each 
source of pressure. The student responded to each 
role activity item by circling whether that source 
of pressure preferred that he spend more, the same, 
or less time than he presently spent on that ac- 
tivity. Items were scored 3 (more), 2 (same), and 1 
(less). This questionnaire yielded nine scores— 
three scale scores for each of the three demand 
sources. A scale score was the sum of the scale’s 
item scores. Because there were no sex differences 
in the perception of fellow students’ demands, the 
present report concentrates its analysis on the 
perceived departmental and personal role demands 
and the departmental and personal role activities 
scales only. 

Satisfaction indices. The work scale of the Job 
Description Index (Smith et al., 1969) was adapted 
(8 items were omitted) to assess the newcomer's 
satisfaction with his role. Items were scored fol- 
lowing Smith et al. (1969, p. 79). The Hoppock 
(1935) index of overall job satisfaction was adapted 
to assess the newcomer's overall satisfaction with 
his role. This instrument contained four items, 
each with seven response alternatives. Each item 
was scored 1-7 (the higher score indicating greater 
satisfaction). 

The role ambiguity index assessed how ambigu- 
ous the student perceived his role to be. This was 
asingle 5-point scale administered at the end of the 
semester data collection. It was scored 1 (not am- 
biguous at all) to 5 (very ambiguous). 

The assimilation index assessed how assimilated 
the student felt he was. This index contained one 
item with seven response alternatives and was 
scored 1 (I am not assimilated in the department 
at all) to 7 (I am completely assimilated in the 
department). 

"The above instruments were assembled into one 
questionnaire. 

"The role performance indices, first-semester and 
first-year grade point averages, were used to 
assess role performance. They had a possible range 
of from 1 (E) to 5 (A). 


Conduct of the Study 


The research design was one of repeated mea- 
sures at three points during the students’ first 
semester in graduate school. Data were collected 
at three 1-hour evening sessions—one during the 
second week of the semester, one at midsemester, 
and one during the last week of the semester. 


Analysis 


Univariate and multivariate analysis of vari- 
ance techniques were used to assess the effect of 
sex, time, and their interaction on the role vari- 
ables. Discriminant analysis was used to further 
understand the nature of the sex difference in role 
satisfaction. Correlational analysis was utilized to 
examine the interrelationships of the role vari- 
ables. 
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RESULTS 


Role Satisfaction 


Although satisfaction with the role 
declined over the semester for both groups, 
by the end of the first semester, females 
were significantly less satisfied with their 
role than were the males. Figure 1 illustrates 
this sex difference as measured by the 
adapted Hoppock (1935) index of overall 
role satisfaction. This finding lends support 
to Hypothesis 1. 

To gain some understanding of the nature 
of this sex difference in satisfaction, the role 
descriptive items on the adapted Job De- 
scription Index were examined. Since there 
is a problem in interpreting differences be- 
tween groups when each variable of a multi- 
variate set is considered singly (see 
Tatsuoka, 1970), a multivariate discrim- 
inant analysis of the 15 Job Description 
Index role descriptive items was used to 
describe the group differences (Table 1). 
The estimated multivariate analogue of 
omega squared indicated that 59% of the 
variance in role satisfaction could be ac- 
counted for by group differences (i.e., sex 
differences). Interpretation of the discrim- 
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Ficure 1. Sex differences in role satisfaction. 
(Significant sex difference on intrasession analysis 
of variance at Ts, p < .05. Significant sex difference 
on repeated-measures analysis of variance, p < 
.05. Abbreviations: T; — beginning of semester, 
T; = midsemester, and T; = end of semester.) 
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TABLE 1 
Raw MEAN ENDORSEMENT VALUES AND Discrim- 
INANT FUNCTION COEFFICIENTS FOR THE JOB 
Description Inpex Irems AT THE END 
OF THE SEMESTER 


3 Raw X Normalized 
Job Description discriminant | Scaled 
Index item —7] function coefficient 
Female] Male| coefficient 

Fascinating 85 Oe) .8i 2.35 
Routine .50| .80} —.09 — 68 
Satisfying 86 | 1.77 +29 2.48 
Boring 1.93 | 1.87 .08 T4 
Good 1.71 | 1.83| —.27 —2.33 
Creative .93 | 1.27 .22 1.94 
Respected .93 | 1.67 cu 1.32 
Pleasant 1.50 | 1.53) .09 -76 
Useful 2.29 | 1.97) —.22 —1.79 
Tiresome .93 | .87 .03 27 
Healthful 21) .78| 24 1.21 
Challenging 1.64 | 1.63} —.48 —4.43 
Frustrating .57 | .87| —.38 —3.11 
Simple 2.21 | 2.40) .09 .65 
Gives sense of 

accomplish- 

ment .21 | 1.30) .40 2.07 


Note. The discriminant function was significant 
beyond the .01 level. 


inant function was based on the scaled 
variable loadings (Table 1). This inter- 
pretation suggests that those who score on 
the positive or high end of the role satisfac- 
tion dimension describe their role as fas- 
cinating, satisfying, and offering a sense of 
accomplishment, while those who score on 
the negative or low end of the dimension 
describe their role as frustrating. The group 
means on this dimension were .56 for the 
males and —.57 for the females. It should be 
noted that two other role descriptive items, 
“challenging” and “good,” have large scaled 
coefficients but serve primarily as sup- 
pressor variables. 


Perceived Role Ambiguity 


As hypothesized, at the end of the se- 
mester, females perceived their role to be 
more ambiguous than did the males. The 
difference between the mean for the females 
(X = 3.71) and the mean for the males 
(X = 3.03) was significant at the .05 level. 
Role ambiguity was negatively correlated 
with role satisfaction (both indices) but 
only significantly so for the males (Table 2). 
(Note that all the correlations discussed in 
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the “Results” refer to the end of the se- 
mester data collection and are presented in 
Table 2. Also, correlations for males and 
females were sometimes of the same mag- 
nitude but were only significant for the 
males because of the difference in the size 
of the two groups.) 
Assimilation 

At each data point, the graduate students 
were asked how assimilated in the depart- 
ment they felt they were. At semester's end, 
females were significantly (p < .05) less 
assimilated than the males who then felt 
about halfway assimilated (Figure 2). This 
important finding made Hypothesis 3 
tenable and suggested that females were 
having a more difficult time in their role- 
making-role-adjustment process. Assimi- 
lation was positively correlated with overall 
role satisfaction for both males and females 
(Table 2). It is interesting to note, however, 
that assimilation correlated negatively with 
role ambiguity (—.57) for males but near 
zero (.08) for the females (Table 2). 


Perceived Role Demands 


This analysis was concerned with sex 
differences in the perception of demands for 
involvement in departmental and personal 
role activities. Of special interest were four 
distinct perceptions: (a) perceived depart- 
mental demand for involvement in depart- 
mental role activities (DDa), (b) personal 
preference for involvement in departmental 
role activities (PPa), (c) perceived depart- 
mental demand for involvement in personal 
role activities (DD,), and (d) personal 
preference for involvement in personal role 
activities (PP,). 

Females perceived greater role demands 
(with respect to magnitude of involvement 
demanded) than did males throughout the 
first semester (Figures 3 and 4). Repeated- 
measures analysis of variance showed this 
sex difference to be very significant on two 
(DD, and PP,) of the four perceptions of 
role demands and relatively so on a third 
(PPa; Table 3). These results offer partial 
support for Hypothesis 4. 

Examination of the intrasession analyses 
of variance indicated that at the beginning 
of the semester there were no significant sex 
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TABLE 2 
INTERCORRELATION MATRICES OF ROLE VARIABLES ASSESSED AT THE END OF THE SEMESTER FOR 
FEMALES AND MALES 


Role 
Variable Demand Conflict Satisfaction | Ambi | Aem" [Performance 
1 | 2 | 3 | 4 5 6 7 8 9 10 ujn 
Females* 
1. DDa 
2. DD, Ad 
3. PPa 18 22 
4. PP, .05  .00  .47 
5. | RCa| 49  .14 —.48 —.20 
6. | RC, | 46 53 18 .83 .50 
7. Job Description 
Index £ -21 —.09 05  .28 —.38 —.28 
8. Hoppock Z ll .00  .07 01 —.45 —.41 .82 
9. Assimilation —.10 —.49 —.24 .03 .00 —.32 E .52 
10. Ambiguity .20 —.24 —.09 .05 37 .44 —.35 —.48 -08 
11. GPA, -.0 -.72 10 —.12 —.47 —.59 .31 .21 41 .26 
12. GPA, —.30 —.76 ll -25 —.64 —.68 44 .38 .48 —.09 .75 
Males^ 
1. DDa 
2. DD, E 
3. PPa +24 45 
4. PP, 54 AT 12 
5. | RCa| .38 .17 -—.52 .33 
6. | RC, | .08 —.39 —.06 +22 15 
7. Job Description! 
Index = —.37 —.04 —.01 —.17 —.25 —.36 
8. Hoppock Z —.87 —.15 —.31 —.15 —.06 —.40 .80 
9. Assimilation .0 —.17 —.27 —.18 05 —.32 31 .54 
10. Ambiguity .06 Al .29 02, —.14 —.18 —.27 —.48 —.57 
11. GPA, —.24 —.40 —.11 —.14 —.21 -.23 13 .18 11 .01 
12. GPA, —.18 —.34 —.12 02 —.14 —.18 .32 .32 .00 —.08 .76 


Note. Abbreviations: DDa — perceived departmental demand for involvement in departmental role 


activities; DD, = perceived departmental demand for involvement in personal role activities; PPa = 
personal preference for involvement in departmental role activities; PP, = personal preference for 
involvement in personal role activities; | RCa | = absolute role conflict with respect to departmental 
role activities; | RC, | = absolute role conflict with respect to personal role activities; GPA, = first- 
semester grade point average; and GPA, — first-year grade point average. 

an = 14,r = ,53, df = 12, p < .05, two-tailed test. 

b n = 30,r = .36, df = 28, p < .05, two-tailed test. 


differences on the four role demand percep- (now statistically significant) demands for 
tions (Table 4). From examination of Figures involvement in role activities than did the 
3 and 4 it appears that males and females males (Table 4). Male and female percep- 
did begin the semester with very similar tions of role demands had diverged dra- 
overall perceptions (with the possible ex- matically (Figures 3 and 4). Females had 
ception of PP,) of departmental demands perceived an increase in the departmental 
and personal preferences for involvement demands for more involvement and had 
in the various role activities. also increased their personal preference for 
At midsemester the sex differences were involvement. The males’ perceptions re- 
in full bloom, Females perceived greater mained relatively stable (Figures 3 and 4). 
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Although the data (Figures 3 and 4) sug- 
gest that females continued to perceive 
greater role demands than the males at the 
end of the semester, the difference was 
statistically significant for only one (DDa) 
of the four perceptions (Table 4). 

In summary, the data indicate that the 
female graduate students did, indeed, ex- 
perience greater role demands than did the 
males throughout the semester. Note also 
that both males and females perceived (a) 
greater departmental demand than personal 
preference for involvement in departmental 
role activities (Figure 3) and (b) greater 
personal preference than departmental de- 
mand for involvement in personal role ac- 
tivities (Figure 4). This represents an 
indication of the validity of the a priori role 
activities scales. 

Perceived role demands (DD, only) had a 
significant negative correlation with satis- 
faction for the males only (Table 2). The 
DD, perception had a significant positive 
correlation with role ambiguity for the 
males (Table 2). 


Role Conflict 


Two measures of role conflict were derived 
by taking the absolute difference between 


Assimilation 
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Time 
Ficure 2. Sex differences in assimilation. (Ab- 
breviations: Tı = beginning of semester, T, = 
midsemester, and T; = end of semester.) 
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Fiaunz 3. Sex differences in perception of role 
demands with respect to the departmental role 
activities scale. (Legend: —— = perceived de- 
partmental demand for involvement and --- = 
personal preference (demand) for involvement. 
Abbreviations: T; = beginning of semester, T; = 
midsemester, and T; — end of semester.) 


the perceived departmental demands (DD) 
and personal demands (or preferences; PP) 
with respect to (a) the departmental role 
activities scale (\DDa — PPa| = |RCal) 
and (b) the personal role activities scale 
([DD, — PP,| = |RC,|). Note that this 
absolute measure of role conflict gives only 
the magnitude of conflict and not the direc- 
tion. Although the data (Figure 5) indicate 
that females generally experienced a higher 
magnitude of role conflict than the males, 
the repeated-measures analysis of variance 
(Table 3) and intrasession analyses of 
variance (Table 4) revealed no statistically 
significant sex differences in role conflict for 
each of the above scales. Therefore, Hy- 
pothesis 5 failed to receive support. 

At the end of the semester, role conflict 
had a significant negative correlation with 
role satisfaction (both indices) for males 
(Table 2). Interestingly, both measures of 
role conflict had significant negative cor- 
relations with role performance (grade point 
averages) for the females (Table 2). 

The personal role activities, and especially 
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Females 


Magnitude of Role Demands 


Males 


Figure 4. Sex differences in perception of role 
demands with respect to the personal role activi- 
ties scale. (Legend: —— = perceived depart- 
mental demand for involvement and --- = 
personal preference for involvement. Abbrevia- 
tions: T, = beginning of semester, T, = mid- 
semester, and T; = end of semester.) 


perceived departmental demand for in- 
volvement in them (DD,), may have been 
a major source of the females’ role adjust- 
ment problems. The DD, significantly cor- 
related .53 and —.39 with role conflict 
({RC,|) for females and males, respectively 
(Table 2). Seemingly, the greater the DD, 
perceived by the females, the greater their 
role conflict, and the greater the DD, 
perceived by the males, the less their role 
conflict. Remember also that DD, increased 
significantly over the semester, especially 
Hid the females (see Table 4 and Figure 
4). 


Role Performance 


Contrary to Hypothesis 6, there were 
no significant sex differences in role per- 
formance as indexed by grade point averages. 
The mean grade point averages for the 
first semester and for the first, year, respec- 
tively, for the females were 4.73 and 4.80 
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and for the males were 4.85 and 4.84. Both 
[RC,| and DD, were significantly negatively 
correlated with grade point average for 
females, while only DD, was so related for 
the males (Table 2). (Note that correlations 
involving role performance may be more 
limited in this sample than in others because 
of the relatively restricted range of the 
grade point averages.) 


Discussion AND SUMMARY 


The broad purpose of this research was 
to investigate the assimilation process (i.e., 
role-making-role-adjustment process) of 
newcomers in an organization. This partic- 
ular study examined considerable evidence 
for sex differences in the assimilation of 
students beginning graduate study in a 
psychology department. Females were sig- 
nificantly less satisfied and assimilated and 
they experienced significantly greater role 
ambiguity and role demands than the males. 
Although there was no significant sex dif- 
ference in role performance (grade point 
average), the data overall suggest that the 
females may have had a more difficult time 
obtaining a level of performance comparable 
to the males. 

These results suggest the operation of a 
process that differentially affects the role 


TABLE 3 
F VALUES FOR REPEATED-MEASURES ANALYSIS OF 
VARIANCE FOR SEX DIFFERENCES IN 
PERCEPTION or ROLE DEMANDS 
AND ROLE CONFLICT 


Source | DDa | DD, | PPa | PPp |IRCal]|IRCpl 
Sex (S) 5.79**| .99  |3.44*|5.78**| .12 | 2.40 
Time (T) |1.27 |6.74***|1.50 | .38 |3.56 | .82 
SxT .93 | .28 .72 |1.01 |.90| .72 


Note. Abbreviations: DDa — perceived depart- 
mental demand for involvement in departmental 
role activities; DD, = perceived departmental 
demand for involvement in personal role activi- 
ties; PPa = personal preference for involvement 
in departmental role activities; PP, = personal 
preference for involvement in personal role ac- 
tivities; | RCa| = absolute role conflict with 
respect to departmental role activities; and 
| RC; | = absolute role conflict with respect to 
personal role activities. 
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making and role adjustment of males and 
females during the first, semester in a grad- 
uate program of psychology. The results do 
not specify in detail how the process oper- 
ates, but they do give us some idea about 
what is happening and when it is happen- 
ing. Seemingly, this process contributes to 
increasing the magnitude of the perceived 
role demands and role conflict for the fe- 
males, while this is less so for the males. 
Perhaps, these perceptions then differen- 
tially affect their role satisfaction and role 
performance. The causal threads which are 
necessary to order these data and to explain 
these phenomena remain at a distance, but 
they are coming into view. 

Although additional studies are needed 
to understand the assimilation process of 
graduate students in psychology depart- 
ments, the differential reactions of females 
and males suggest that the current as- 
similating process may be somewhat more 
compatible with males than females. In 
light of the foregoing sex differences, perhaps 
graduate departments should reconsider the 
assimilation process and make necessary 
modifications. 

In relation to Baird's (1969) study, the 
present study indicates the importance of 


TABLE 4 
F VALUES FOR INTRASESSION ANALYSES OF VARI- 
ANCE FoR SEX DIFFERENCES IN PERCEIVED 
Rore Demanps AND RonE CONFLICT 


Perception | Beginning of | Midsemester | noster 
DDa .35 5.48* 5.61* 
DD, 48 1.35 .58 
PPa 40 5.52* 3.09 
PP; 2.55 9.02** 2.91 
| RC, | aT 2.89 2.33 
| RCa | 40 a) 1.01 


Note. Abbreviations: DDa = perceived depart- 
mental demand for involvement in departmental 
role activities; DD, = perceived departmental 
demand for involvement in personal role activi- 
ties; PPa = personal preference for involvement 
in departmental role activities; PP, = personal 
preference for involvement in personal role ac- 
tivities; |RCa| = absolute role conflict with 
respect to departmental role activities; and 
| RC, | = absolute role conflict with respect to 
personal role activities. 

* p< 025. 

** p< .005. 


Females 


Males 


Nod 


Role Conflict 


Females 


Males 


(0) 
n Te Ts 
Time 

Ficure 5. Sex differences in role conflict. 
(LEGEND: —— = absolute role conflict with respect 
to departmental role activities and -—— = abso- 
lute role conflict with respect to personal role 
activities. Abbreviations: T; = beginning of 
semester, T; = midsemester, and Ts = end of 
semester.) 


complementing static, cross-sectional studies 
with longitudinal studies of homogeneous 
populations. The present study also makes 
it clear that future research on graduate 
student role relations must include an ex- 
amination of potential sex differences. 

In relation to current questions con- 
cerning the supposed dramatic change 
occurring in modern woman’s role, the 
present study provided some evidence that, 
although some women have “come a long 
way, baby,” getting there is not half the fun. 
That is, although females attained a level of 
role performance comparable to the males, 
they experienced less satisfaction and as- 
similation and more role demands, role 
ambiguity, and role conflict than did the 
males. This is essentially congruent with 
Kaley’s (1971) conclusion that while there 
is some acceptance of the woman in a pro- 
fessional role on a theoretical level, there is 
rejection on an applied level. 

Future research should not only further 
test the sex-difference hypotheses explored 
here, but should also consider research 
questions such as, Are there differences in 
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the norms for males and females? What is 
the effect of male faculty or female faculty 
(or the lack of)? What modes (reinforce- 
ment, dependency, punishment, imitation, 
etc.) does the organization use to assimilate 
its members? How formalized are these 
modes? Future research should also monitor 
the assimilation process over the entire time 
span spent in graduate school. 

Though the present study looked at only 
one new class of graduate students in one 
psychology department, the research strat- 
egy used seems quite promising and worthy 
of further refinement, since it very likely is 
applicable to the study of the assimilation 
(and satisfaction, performance, etc.) of 
other role incumbents in other types of 
organizations. 
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FORMAL DISCIPLINE REVISITED: 


AFFECTIVE ASSESSMENT AND NONSPECIFIC TRANSFER’ 
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Two meaningfulness measures, association value and reinforcement 
value, are considered as potential factors in the nonspecific transfer 
of learning; the first as a frequency measure and the second as an 
affective dimension of meaningfulness. Experiment 1 helps establish 
that reinforcement value cannot be reduced to nomothetic associa- 
tion value; Experiments 2 and 3 suggest that reinforcement value 
enters into nonspecific transfer in paired-associate learning while 
association value does not. Moving from negative reinforcement 
value to positive reinforcement value paired-associate lists results in 
the greatest positive nonspecific transfer. Moving from positive re- 
inforcement value to negative reinforcement value paired-associate 
lists results in the least positive nonspecific transfer, and even in 
negative nonspecific transfer. Formal discipline theory may repre- 
sent an inaccurate conceptualization of an accurate reinforcement 


value phenomenon. 


The study of transfer in learning was 
originally stimulated by the subsequently 
discredited formal discipline thesis (Hall, 
1971, pp. 357-358). Morgan (1906, p. 192), 
an advocate of this doctrine, held that the 
training of a student in Latin was invaluable 
in developing his “faculties” for comparison 
and generalization, which in turn would 
redound positively on his learning of English. 
As is well known, Thorndike and Wood- 
worth (1901a, 1901b, 1901c) conducted a 
series of studies to set the formal discipline 


1 Experiment 2 and portions of Experiment 3 
were supported by Grant 5-71-0039 (509) from the 
U. S. Office of Education, Department of Health, 
Education, and Welfare awarded to Purdue Uni- 
versity, with Joseph F. Rychlak as principal 
investigator. The opinions expressed herein, 
however, do not necessarily reflect the position or 
policy of the U. S. Office of Education, and no 
official endorsement by the U. S. Office of Educa- 
tion should be inferred. The authors would like 
to thank Marshall R. Schmidt and Sherry L. 
Felbain for their assistance in data collection. 

2 Requests for reprints should be sent to Joseph 
F. Rychlak, Department of Psychological Sci- 
ences, Purdue University, West Lafayette, In- 
diana 47907. 


thesis into a decline from which it has never 
recovered. One of the main outcomes of the 
Thorndike-Woodworth argument was to 
solidify the construct of transfer in learning 
to a specificity thesis. Only those aspects of 
Latin which directly related to English as 
stimuli (as in word roots, etc.) could be 
given credit for improvements to be noted 
in a subject’s performance across tasks. 
Unrelated tasks, such as learning arithmetic 
to improve grammar, could not be seen to 
enter into a transferred facilitation across 
areas of study (see Thorndike, 1914, p. 268). 

This line of thought was carried into 
verbal learning as a concern with the 
specific characteristics of successive paired- 
associate lists. In order to evaluate the 
possible influence of factors not related to 
specific transfer effects, a control condition 
was employed of the order A-B, C-D. The 
specific factors of an A-B, A'-B sequence 
(similar stimuli and identical responses 
across two paired-associate lists) or an A-B, 
A-C sequence (identical stimuli and un- 
related responses) could thus be compared 
with control conditions (A-B, C-D) in 
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which no overlap in stimuli or responses 
took place. 

Of course, this did not mean that im- 
provement across the A-B, C-D lists was 
lacking. Subjects demonstrated what has 
since been termed a nonspecifie (positive) 
transfer, which is usually defined as the 
improvement in performance to be noted 
across lists independent of list content 
(Kausler, 1966, p. 361). Note that though 
specific transfer (Thorndike, 1914) can be 
said to take a positive (improving) or nega- 
tive (declining) direction, nonspecific trans- 
fer has been theoretically conceptualized as 
exclusively positive in nature. Theoretical 
explanations of specific transfer have fol- 
lowed mediation theory, with the constructs 
of stimulus discrimination and response 
integration given major emphasis (see, e.g., 
Kjeldergaard, 1968). Explanations of non- 
specific transfer have usually emphasized 
the constructs of “warm-up” and “learning- 
to-learn" (see, e.g., Kintsch, 1970, pp. 35- 
36). 

Aside from the belief in “faculties,” 
which might be improved on the analogy of 
muscle training, the core distinction between 
formal discipline and the specificity thesis 
concerns just how much the subject brings 
to the task even when it is a novel one. 
Formal discipline held that the subject 
“comes at" the novel learning task with a 
more or less well-developed capacity to 
influence the outcome of what is eventually 
learned. If highly developed, this capacity 
leads to improvement; if poorly developed, 
a relative detriment in learning is likely to 
be the case (Roark, 1895, pp. 271-279). 
Specificity theory holds that only those 
antecedent experiences in identical or highly 
similar tasks can influence outcome in a 
seemingly novel task. If this previous ex- 
perience is flatly contradictory to the 
present (novel) task, a decline in per- 
formance (negative transfer) will be the 
result. Certain “tricks of the trade" (learn- 
ing to learn) incidental to the specific con- 
tent of the material to be learned may also 
be learned; or, thanks to the motoric action 
of “warm-up” in the succession of tasks, a 
facilitation in rate may be induced. But 
nothing in the subject's conceptual, cogni- 
tive, or related intellectual capacities of a 
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nonspecific character can be credited with 
inducing transfer effects. 

The area of study in verbal learning 
which relates most directly to the question 
of what the subject might or might not 
bring to a novel situation is that of meaning 
(i.e., the point or significance which a sign 
or'symbol carries for the subject) and its 
corollary metric of meaningfulness (i.e., the 
extent of the significance carried by the 
sign or symbol for the subject in question). 
The consonant-vowel-consonant trigram can 
be adapted as a controlled measure of 
meaningfulness. For example, even though 
one subject may think of PoP as (meaning) 
a soft drink, another as (meaning) a parent, 
and a third as (meaning) an exploding noise, 
the fact that all three subjects do associate 
a meaning to this trigram makes it essen- 
tially meaningful for all concerned. Mean- 
ingfulness has been equated even though 
meaning has not. Material of this sort 
should be easier to learn than trigrams to 
which the subjects have no word associate. 

A meaningfulness measure of this "fre- 
quency" type is termed (word) association 
value, and it can be measured nomothetically 
by finding the base rate of a trigram’s word 
quality among a group of subjects (e.g., 
Archer, 1960). Trigrams with high base 
rates are thus considered more meaningful 
by the experimenter than those with low 
base rates. But association value can also be . 
measured idiographically by having each 
subject rate whether or not he has a unique 
word associate to the trigrams he is to 
learn, regardless of the base rate for these 
materials. He may have an associate when 
most subjects do not or vice versa. In this 
case, meaningfulness is an “either-or” 
characteristic rather than a “high-low” 
characteristic of verbal materials. 

Noble (1961) first raised the question of a 
nonspecific transfer in verbal learning, and 
he did so based on an earlier motor-learning 
study in which the specificity thesis had 
been given prominent consideration as 8 
theoretical rationale. We refer here to the 
work of Jones and Bilodeau (1952), who 
had found that when the subject moved 
from a difficult motor task (cloverleaf pat- 
tern in a recessed tracking device) to @ 
simple motor task (circle pattern), there was 
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greater positive transfer than vice versa. 
The question naturally arose, was this a 
specific transfer due to the fact that a circle 
practice effect was gotten via the subjects 
experience in drawing cloverleafs, or was 
this à form of nonspecific transfer? Noble 
felt it was probably the former, but in 
drawing an analogy between the difficulty 
levels of the motor tasks and the verbal 
learning of trigrams and quasi words, he 
wondered what would be the effect of 
moving from a difficult to an easy free-recall 
task, as compared to the opposite sequence. 
Would there be a greater (positive) non- 
specific transfer across the former sequence? 

Tf Latin is more difficult (less meaningful) 
for most students to learn than English 
(more meaningful), the implications of this 
research line for the formal discipline con- 
troversy are clear. Could it be that an ob- 
servation made by the teacher when his 
students moved from such difficult areas of 
study to relatively easier areas resulted in 
the myth of formal discipline? Noble did 
not make this connection, of course. He 
had subjects memorize highly meaningful 
(easy) and less meaningful (hard) free- 
recall lists in contrasting orders. No demon- 
strable effect for his association value 
measure could be seen in the data for either 
sequence of lists (hard-easy or easy-hard). 
So confident was Noble (1961) in his re- 
search that he concluded, *"Meaningfulness 
facilitates rate of acquisition but has no 
influence upon [nonspecific] transfer of 
training [p. 209].” 

The subsequent research on this topic 
has been rather sparse. L’Abate (1962) 
contrasted subjects on high- versus low- 
nomothetic association value in a paired- 
associate task. He was primarily interested 
in the role of anxiety on learning, and his 
use of a rather involved measure of transfer 
makes his scanty findings suspect. He did 
show more nonspecific transfer for high 
association value than low association value 
across successive lists, but this was only for 
male subjects. Although a slight trend in 
favor of high association value leading to 
more nonspecific transfer over low as- 
sociation value was occasionally noted in a 
series of subsequent studies (many of which 
employed this tactic as a control condition; 
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see the discussion above), in most cases the 
findings on association value meaningfulness 
in transfer are clearly negative (Dean & 
Kausler, 1964; Houston, 1965; Jung, 1963; 
Merikle & Battig, 1963; Stark, 1968). In 
no case was a "negative" nonspecific trans- 
fer found. 

Findings from a more recent study 
(Rychlak & Tobin, 1971) were to shift the 
question of nonspecific transfer in verbal 
learning to an alternate dimension of 
meaningfulness. We refer here to reinforce- 
ment value, which is a rating of likability 
done on learnable materials preliminary to 
entering them into tasks for the subject to 
acquire (Rychlak, 1966). The reinforce- 
ment value dimension of meaningfulness 
is viewed as an affective assessment, with 
the construct "affect viewed as the psycho- 
logical evaluation made of the experience 
by the subject. Often this assessment is 
made on the basis of positive or negative 
physiological reactions known as emotions, 
but it is not necessary to be feeling an 
emotion (physical) in order to make an 
affective (psychological) assessment of liking 
or disliking. 

A continuing effort has been made to 
prove that association value and reinforce- 
ment value are independent dimensions of 
meaningfulness. For example, even though 
Pearsonian correlations of from .54 to .84 
can be found between nomothetic ratings of 
association value and reinforcement value, 
when these same data are reanalyzed 
idiographically via chi-square, it is found 
that a serious confounding of association 
value and reinforcement value ratings takes 
place only at the lower levels of nomothetic 
ratings for trigrams (i.e., where few subjects 
have a word associate to the trigram in 
question and the ratings for reinforcement 
value are highly likely to be negative; see 
Tenbrunsel, Nishball, & Rychlak, 1968). 
At the 80% to 100% nomothetic association 
value base-rate levels, where trigrams are 
easy to identify as having literal word 
meanings, only about one trigram in five is 
rated identically by the subject as to 
association value (yes, it is a word versus 
no, it is not a word) and reinforcement 
value (liked versus disliked, respectively). 
One-week test-retest reliability for rein- 
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forcement value is comparable to associa- 
tion value (in the .90s nomothetically, with 
about 35 out of 100 subjects changing their 
ratings when considered idiographically). 
Experimental evidence of the independence 
of reinforeement value from association 
value in learning was readily obtained. The 
more rapid acquisition of liked over dis- 
liked materials-(termed the positive rein- 
forcement value effect) was demonstrated 
without statistical interaction with associa- 
tion value across all levels of nomothetic 
association value and, within any such level, 
was statistically independent of idiographic 
association value measures as well (Abram- 
son, Tasto, & Rychlak, 1969). 

The positive reinforcement value facilita- 
tion has been found in paired-associate 
learning and in free-recall studies, using 
either mixed or unmixed lists. Such learning 
effects have also been found in the recogni- 
tion of designs, abstract paintings, and 
pietures of human faces which have been 
prerated for reinforcement value (Rychlak, 
Galster, & McFarland, 1972). The age 
range of subjects in which reinforcement 
value effects on learning have been noted 
spans first grade through past 50 years 
(Rychlak & Saluri, in press). An interesting 
finding on abnormals (schizophrenics) is that 
such subjects either collapse the positive 
reinforcement value effect to insignificance 
or reverse it entirely (i.e, actually learn 
their disliked materials more rapidly than 
their liked; Rychlak, McKee, Schneider, & 
Abramson, 1971). Finally, black females 
have been shown to learn more along rein- 
forcement value than association value 
relative to white females, who show the 
opposite predilection (Rychlak, Hewitt, & 
Hewitt, 1973). 

In the Rychlak and Tobin (1971) study, 
overachieving male high school students 
(relatively low IQ, high grades) were con- 
trasted with a matched group of under- 
achievers (relatively high IQ, low grades). 
Since this was an unmixed paired-associate- 
lists study, it was necessary to counter- 
balance for the order in which subjects took 
their tasks (liked-disliked versus disliked— 
liked). This was tantamount to an A-B, 
C-D “control” condition in the typical 
transfer study discussed above. The general 
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findings were that the underachiever re- 
flected a larger positive reinforcement value 
effect than the overachiever. But in counter- 
balancing lists, it was inadvertently dis- 
covered that subjects who learned disliked 
lists before liked lists improved significantly 
more on their second list than did subjects 
who moved from a liked to a disliked list. 
Indeed, the lower IQ subjects of the sample 
actually reflected a decline in performance— 
that is, a negative (nonspecific) transfer— 
across the liked-disliked sequence. 

If one now equates negative reinforcement 
value materials with “difficult to learn” and 
positive reinforcement value materials with 
“easy to learn,” the findings of Jones and 
Bilodeau (1952) which Noble (1961) was 
unable to extend to verbal materials via the 
association value dimension are given sup- 
port. Could an affective assessment of 
meaningfulness be the historical source of 
formal discipline theory? A test of the 
hypothesis that reinforcement value con- 
tributes to nonspecific transfer while as- 
sociation value does not was called for. As 
an additional development of this research 
line, a recent criticism made of reinforcement 
value procedures was also put to test. 
Three studies are presented: the first takes 
up the criticism referred to and the last two 
confront association value and reinforce- 
ment value in nonspecific transfer paired- 
associate tasks. 


EXPERIMENT 1 


Method 
The following hypotheses were tested: 


1, Ratings made by the subject of the sort 
“yes, it [the trigram] is word related and I like 
it” increase in proportion as successively higher 
Les of nomothetic association value are sam- 
pled. 

_ 2. Ratings of “yes, it’s word related but I 
dislike it” also increase in proportion as succes- 
sively higher levels of nomothetic association 
value are sampled. 

_3, Ratings of “no, it's not word related and I 
dislike it" inerease in proportion as successively 
lower levels of nomothetic association value are 
sampled. 

. 4. Ratings of “no, it’s not word related but I 
like it” also increase in proportion as successively 
lower levels of nomothetic association value are 
sampled. 


AFFECTIVE ASSESSMENT AND NONSPECIFIC TRANSFER 


Rationale. Previous work on reinforcement 
value had established that the rating of ''like 
versus dislike" employed to operationalize affec- 
tive assessments could not be reduced to either a 
nomothetic or an idiographic (frequency of) 
association value measure (Rychlak et al., 1971). 
Subjects did not associate more words to liked 
than to disliked trigrams, no matter what nomo- 
thetic level these trigrams were taken from, so that 
a “frequency” explanation of reinforcement value 
(as due to the number of association-value-related 
associations) could not be defended. The word 
associates proffered by the subjects did fall prop- 
erly into line, however, in that disliked trigrams 
suggested disliked word associates and liked 
trigrams suggested liked word associates. A 
criticism which has been raised since this earlier 
finding was reported suggests that just because 
the subject rates “liked” to a nomothetically 
low-ranging trigram (in association value) for 
which he has no word associate, it does not follow 
that all nomothetic association value factors are 
removed from potential influence on his subse- 
quent learning effort. The trigram might thus 
appear to be one of ‘no, it’s not a word but I like 
it,” when in fact the actual word quality for this 
trigram is much higher in a “true” base-rate 
sense than this idiographic rating would suggest. 

In other words, this criticism holds that rein- 
forcement value measures capitalize on the error 
of measurement, selecting out those trigrams 
which have a covertly higher association value 
than the base-rate norms (gleaned from other 
subjects’ judgments) currently reflect for positive 
reinforcement value and those which have a lower 
association value than these norms now reflect 
for negative reinforcement value. Learning rate 
differentials across positive and negative rein- 
forcement value would thus be explained on the 
basis of this covert artifact. 

To counter this reasonable charge, it is neces- 
sary to show that positive reinforcement value 
ratings of trigrams which are judged by subjects 
as idiographically without association value will 
be more frequent at the lower levels of nomothetic 
(base-rate) association value norms than they 
will at the higher levels (Hypothesis 4). Con- 
versely, trigrams that are judged as negative in 
reinforcement value but as having idiographic 
word quality should be more frequent at the higher 
levels of nomothetic (base-rate) association value 
norms than they are at the lower levels (Hypothe- 
sis 2). In other words, the idiographic reinforce- 
ment value judgment will be shown to go against 
the drift of nomothetic association value levels 
even as idiographic association value ratings are 
performing true to expectation. The association 
value ratings will fall or rise idiographically in 
line with the levels of nomothetic association value 
from which all trigrams are selected, but the rein- 
forcement value ratings will show a reverse tend- 
ency. How is it possible to conceive of reinforce- 
ment value as a covert nomothetic association 
value measure—capitalizing on the error of mea- 
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surement—if the crucial negative reinforcement 
value ratings are more frequent at a higher nomo- 
thetic association value level and the crucial 
positive reinforcement value ratings are more 
frequent at a lower nomothetic association value 
level? 

The remaining two combinations of idiographic 
reinforcement value and association value are 
expected to show a common proportion change 
across the levels of nomothetic association value. 
That is, subjects can be expected to find it difficult 
to like what they have no word meaning for, so 
this rating should increase in proportion as the 
nomothetic levels of association value drop off 
(Hypothesis 3). Conversely, when a trigram has 
word quality, it is more probable that the subject 
will find a liked meaning in his associations, so 
this rating should increase in proportion as the 
nomothetic levels of association value increase 
(Hypothesis 1). Hence, the crucial hypotheses 
which help establish that reinforcement value is 
not a covert measure of either idiographie or 
nomothetic association value are Hypotheses 2 
and 4. 

Subjects. Two hundred and fifty-six under- 
graduate college students (120 males, 136 females) 
were divided by random assignment into five 
groups for the rating procedure. There were 
approximately 50 subjects per group, divided 
equally by sex. 

Procedure. Five levels of nomothetic association 
value were delineated, based upon Archer’s (1960) 
base-rate norms for consonant-vowel-consonant 
trigrams: 0%-20%, 21%-40%, 41%-60%, 61%- 
80%, and 81%-100%. Two hundred trigrams were 
randomly selected from each of these base-rate 
levels (discarding those which had reflected a sex 
difference in Archer’s survey) and then were 
administered to one of the five groups of subjects. 
Each subject thus rated 200 trigrams. In the usual 
rating procedure (see Experiments 2 and 3), 
association value and reinforcement value are 
assessed separately. However, in the present study 
a subject was asked to make both association value 
and reinforcement value ratings at the same time. 

Each subject was asked to make two concurrent 
decisions: (a) to judge whether or not the trigram 
“looks like a word, sounds like a word, or can it 
be used in a sentence?” (after Archer, 1960) and 
(b) to judge whether he liked or disliked the 
trigram on the basis of how it “sounds” as he read 
it “aloud” to himself. Thus, a subject checked 
one of the following four alternatives for each of 
the 200 trigrams he was exposed to via special 
rating forms: 

— — yes-like (yes, word related, and liked the 

sound) 

— — yes-dislike 

—— no-like 

— — no-dislike 

This rating procedure was repeated after a 
48-hour period had elapsed.* Only trigrams which 
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Figure 1. The mean percentage of trigram 
ratings in four scoring categories (yes-like, yes- 
dislike, no-dislike, and no-like) as reported by 50 
subjects at each of five nomothetic association 
value levels are presented. 


had been rated identically on both occasions were 
used in the data array and statistical analysis. 


Results 


A score for each of the four rating cate- 
gories was determined for each subject. 
This reflected the percentage of the total 
number of reliable trigrams he had assigned 
to the rating category in question. Figure 1 
presents the mean percentage of trigrams 
in a scoring category, distributed across the 
five levels of nomothetic association value. 

Figure 1 will be reviewed by experimental 
hypothesis. Note that the yes-like (Hy- 
pothesis 1) distribution rises with increasing 
levels of nomothetic association value (as 
measured across the five experimental 
groups of roughly 50 subjects per group). 
The yes-dislike proportion (Hypothesis 2) 
also shows a rising trend with increasing 
nomothetic association value. The no- 
dislike (Hypothesis 3) rating is considerably 
larger in proportion score at the lower 
levels of nomothetic association value than 
it is at the higher levels. Finally, the no- 
like (Hypothesis 4) rating drops rather 
decidedly at the highest level of nomothetic 
association value in comparison to the level 
it had achieved at the lowest level. 

When these proportion scores were ex- 
pressed as linear functions of the five mid- 


in the usual reinforcement value learning effects. 
Indeed, it has become the standard rating proce- 
dure since the present studies were conducted. 
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range values of nomothetic association value 
(graphed as .10, .30, .50, .70, and .90), the 
following regression equations were found: 
Yyı = -39x + .12 for yes-like; yya = Ala + 
.09 for yes-dislike; yna = —-45x + .61 for 
no-dislike; and ynı = —-05% + .17 for no- 
like. The correlations associated with these 
regression lines (V = 250) are .46 (p< 
001), .22 (p < .001), —.47 (p < .001), and 
—.09 (p < .07), respectively. Thus, the 
slopes of these graphed lines were sig- 
nificantly different from zero in all but the 
no-like condition (which approximated 
significance and on a one-tailed presumption 
could be said to have reached significance). 
It would appear from Figure 1 that any 
criticism suggesting that reinforcement value 
is an artifact of nomothetic association 
value would be difficult to sustain. 


EXPERIMENT 2 


Method 
The following hypotheses were tested: 


1. Idiographically assessed association value 
and reinforcement value have significant but 
independent (main) effects on acquisition in 
paired-associate tasks. 

2. Idiographically assessed association value 
fails to have interlist effects and reinforcement 
value succeeds in having interlist effects in a 
nonspecific transfer design having the paired- 
associate characteristics of A-B, C-D. 

3. When the subject moves from a disliked to a 
liked task, the greatest positive nonspecific trans- 
fer occurs. 

4. When the subject moves from a liked to & 
disliked task, the least positive transfer occurs. 


Rationale. The first two hypotheses embody 
Noble's (1961) dietum, and there is ample evidence 
Írom earlier research on idiographie association 
yalue and reinforcement value that Hypothesis 1 
is to be expected (Abramson et al., 1969; Rychlak 
et al., 1973). Hypotheses 2, 3, and 4 flow from the 
previous work on association value in transfer 
and the findings of Rychlak and Tobin (1971) 
discussed in the introduction. 

Subjects and procedure. Subjects were 64 (32 
males, 32 females) volunteer college students, 
primarily from the freshman and sophomore class 
levels. They made association value and reinforce- 
ment value ratings in a group and were then tested 
individually in the paired-associate task. A sub- 
ject was administered a mimeographed rating 
form on which 140 trigrams from the middle 
ranges of Archer's (1960) nomothetic association 
values (4095-7095) were printed (see Rychlak, 
1966, for the initial work done on these trigrams)- 
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On the first two occasions, the subject was ad- 
ministered the trigrams with 48 hours intervening 
and asked to answer yes or no to the question, 
“does the trigram look like a word, sound like a 
word, remind you of a word, or can you use it in a 
sentence?" (after Archer, 1960). Those trigrams 
rated yes on both oecasions were taken as mean- 
ingful to the subject in the sense of association 
value, and those rated no were considered less 
meaningful or unmeaningful. 

One week following the association value rat- 
ings, the subject was again administered the 140 
trigrams and asked to rate them for reinforce- 
ment value, based upon how they “looked” and 
“sounded” to him as he read them to himself. 
In this case, the more usual procedure was fol- 
lowed which involves a 4-point rating of like 
much, like slightly, dislike slightly, and dislike 
much (see Experiment 1 and Footnote 3). The 
convention followed in scoring for reinforcement 
value is to administer trigrams on two occasions 
with 48 hours intervening. Only those trigrams 
rated liked or disliked (preferably, “‘much’’) on 
both scoring occasions are then employed to 
construct the subject’s paired-associate lists. The 
unreliable association value ratings were also 
ignored by the experimenters. A series of eight 
conditions were then arrayed by the experi- 
menters, which perfectly counterbalanced associa- 
tion value and reinforcement value across two 
entirely different six-pair paired-associate lists. A 
mixed design of three between-subjects variables 
and one within-subjects variable was the result 
(24 factorial). The variables were Sex X Reinforce- 
ment Value X Association Value X Lists. 

Subjects were randomly assigned to these eight 
conditions. In List 1, a subject might thus learn 
six pairs having the idiographic quality of associa- 
tion-value-no, reinforcement-value-liked and 
then follow this with a list of association-value-no, 
reinforcement-value-disliked. In this case, posi- 
tive reinforcement value would have been removed 
from the materials to be learned in the second 
list, In another instance, the subject might move 
across lists having the characteristics of associa- 
tion-value-no, reinforcement-value-disliked to 
association-value-yes,  reinforcement-value-dis- 
liked. In this case, association value meaningful- 
ness would have been *'added into” the materials 
to be learned. It should be emphasized that both 
paired-associate members had identical meaning- 
fulness qualities; that is, so-called stimulus versus 
response meaningfulness was not under study. 

The lists were presented by memory drum, 
with three different orders of lists to counter 
serial learning cues and a 4-second exposure time. 
Method of anticipation was followed, with two 
correct anticipations for the entire list taken as 
learning criterion. Total testing time occupied 
from 20 minutes to 1 hour. In both Experiments 2 
and 3, the following steps were taken to counter 
the effects of letter overlap, rhyming, and so forth 
on rate of acquisition: phonetically similar tri- 
grams (rur and cup) were not used, and if the two 


145 


trigrams to be paired began with the same conso- 
nant or the first trigram ended with the consonant 
of the second, these pairings were also discarded 
(e.g., CUL and CAL or TUC and cur). 


Results 


A 2* factorial analysis of variance, was 
computed with three between factors (sex, 
association value, reinforcement value) and 
one within factor (lists). Although females 
tended to reach criterion faster than males, 
there was no significant sex difference. A 
significant main effect was found for associa- 
tion value, with means (and standard 
deviations) as follows: association-value- 
yes, 12.11 (6.83); association-value-no, 
14.72 (7.19) (F = 6.44, df = 1/56, p < .01). 
A reinforcement value main effect, was also 
apparent, as follows: reinforcement-value- 
liked, 11.77 (5.65); reinforcement-value- 
disliked, 15.06 (7.57) (F = 10.26, df = 
1/56, p = .002). A pronounced nonspecific 
(positive) transfer effect was evident, as 
follows: List 1, 16.31 (7.51); List 2, 10.52 
(4.62) (F = 29.64, df = 1/56, p < .001). 
There was no interaction between lists and 
either association value or reinforcement 
value, but this was due to the nature of this 
analysis of variance which would require a 
differential effect between lists to reflect a 
significant finding. In short, the first analysis 
of variance demonstrated that association 
value and reinforcement value each had 
effects in both lists and that these effects 
did not differ between lists. Since there was 
no interaction between association value 
and reinforcement value, the independence 
of the two measures of meaningfulness was 
completely supported. Hypothesis 1 was 
thus entirely substantiated. 

In order to test the remaining hypotheses, 
two analyses of variance were run which 
collapsed one measure of meaningfulness on 
the other, while the latter was being as- 
sessed sequentially, as follows: ++, +—, 
— +, ——. This called fora 2 X 4 X 2 
(Sex X Sequences X Lists) analysis of 
variance with the first two factors between 
conditions and the last a within-subjects 
condition. For example, the experimenters 
could first assess the role of association- 
value-yes (+) to association-value-yes (++), 
association-value-yes (+) to association- 
value-no (—), association-value-no (—) to 
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TABLE 1 


Means AND STANDARD DEVIATIONS OF ASSOCIATION VALUE AND REINFORCEMENT VALUE 
MzaniNGFULNESS Scores Across Lists 


Meaningfulness 
Association value Reinforcement value^ 
Order of meaningfulness 
across Lists 1 and 2 ip Im 1. IT 
M SD M SD M SD M SD 
+ to + 15.37 8.01 8.69 3.57 14.63 5.12 8.44 8.74 
+ to — 15.06 5.33 11.88 4.53 14.31 6.19 11.19 5.28 
—to+ 16.69 6.71 9.31 3.69 19.18 10.00 9.68 4.87 
= toc 18.13 10.47 12.19 5.41 17.13 6.74 12.75 3.75 


* Order and list interaction, p < .03. 


association-value-yes (+), and association- 
value-no (—) to association-value-no (—) 
with reinforcement value held constant 
across these sequences. Then a second run of 
the 2 X 4 X 2 analysis of variance would 
establish these identical patterns of liked 
versus disliked reinforcement value with 
association value held constant. 

Table 1 contains the means and standard 
deviations of association value and rein- 
forcement value across lists in terms of the 
order sequences outlined. The crucial test of 
Hypothesis 2 depended upon interaction 
effects between the four order sequences and 
the two lists. This interaction failed to 
reach significance for association value (F 
= 1.69, df = 3/56). However, it did achieve 
significance on reinforcement value (E = 
3.15, df = 3/56, p = .03). There were no 
sex main effects or interactions. Note that 
the condition accounting for maximum 
(positive) transfer was when the subject 
moved from negative reinforcement value 
to positive reinforcement value and that the 
condition reflecting the least nonspecific 
transfer was when the subject moved from 
positive reinforcement value to negative 
reinforcement value. This is a cross-valida- 
tion of Rychlak and Tobin (1971). Hy- 
potheses 2, 3, and 4 are thus supported in 
the data analyses. 


EXPERIMENT 3 


Method 
The following hypotheses were tested: 


1. Reinforcement value shows nonspecific 


transfer effects across four successive paired- 
associate lists. 

2. Moving from positive reinforcement value 
to negative reinforcement value results in the 
least (positive) transfer regardless of association 
value effects. 

3. Moving from negative reinforcement value 
to positive reinforcement value results in the 
greatest (positive) nonspecific transfer regardless 
of association value influences in the paired- 
associate lists. 


Rationale. If reinforcement value has interlist 
effects across two lists, it seems plausible to expect 
such effects across several lists. The order of these 
interlist effects should follow Experiment 2 as 
regards maximum and minimum gains in paired- 
associate acquisition. The relatively unimportant 
role for idiographie association value found in 
Experiment 2 was expected to continue. 

Subjects. The subjects were 16 adults (8 males, 
8 females), ranging in educational background 
from the 8th to the 12th grade. They were screened 
with the Shipley (1946) intelligence scale to rule 
out subnormal intelligence (IQ below 90). Job 
backgrounds for these subjects included factory 
work, domestic, janitorial, and practical nurse 
activities. All subjects fell in the **upper-lower" 
Socioeconomie class category, based on the Hol- 
lingshead and Redlich (1958) index. These subjects 
were recompensed for their participation in 
Experiment 3. 

Procedure. The strategy of Experiment 3 was 
to arrange a series of four paired-associate lists 
which varied sequentially in idiographie associa- 
tion value and reinforcement value in such a way 
as to generate interlist effects according to the 
hypotheses under test. Perfect counterbalancing 
of all possible meaningfulness combinations was 
not attained, nor was it considered necessary for a 
testing of the hypotheses. The first step involved 
having the subject idiographically rate 200 tri- 
grams selected from roughly the 70% level of 
Archer’s (1960) norms for both association value 
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and reinforcement value (exactly as in Experiment 
2). Pretesting had established this to be the most 
satisfactory level of difficulty (nomothetic associ- 
ation value) for subjects of this general back- 
ground. 

Next, four different six-pair lists were arrayed 
for the subject, having the characteristics of 
(A) association-value-yes, positive reinforcement 
value (liked); (B) association-value-no, positive 
reinforcement value; (C) association-value-yes, 
negative reinforcement value (disliked); and 
(D) association-value-no, negative reinforcement 
value. Both paired-associate members of each of 
these lists had the same  association-value- 
reinforcement-value meaningfulness combinations; 
that is, so-called stimulus versus response 
meaningfulness was not at issue. Hence there were 
four unmixed lists, administered to the subject in 
one of four sequences (see Figure 2). The males 
and females were assigned to the conditions ran- 
domly within the sex breakdown, so that two 
members of each sex were performing in each of 
the four sequences. Lists were presented via 
memory drum, with three randomized orders used 
to counteract serial learning cues and a 4-second 
exposure time. Subjects were given three ‘‘warm- 
up" trials on a practice list. Method of anticipa- 
tion was followed, and learning criterion was again 
two successive correct anticipations on the entire 
six-pair list. Experimental periods took from 45 
minutes to 1!4 hours, with an ample ''break" 
given between lists for the subject to go to the 
lavatory, smoke a cigarette, and so forth. 


Results 


Figure 2 presents the means and standard 
deviations of trials to reach criterion in 
graph form, broken down into the four 
association- value-reinforeement-value se- 
quence conditions (I, II, III, IV). The four 
graphs of Figure 2 thus trace the interlist 
performance of four subjects (two males, two 
females) across varying sequences of asso- 
ciation-value-reinforcement-value meaning- 
fulness (symbolized by the above A, B, C, 
D coding). Nonspecific transfer was assessed 
statistically in two ways. 

First, difference scores were obtained for 
each subject based on his successive per- 
formances across his four lists. These data 
were then submitted to a 2 (Sex) X 4 (A, B, 
C, D) factorial analysis of variance, with the 
following results: (a) there was no main sex 
or interaction effect; (b) there was a differ- 
ence score main effect between Lists 1 and 2 
(F = 4.69, df = 3/8, p < .05); (c) there was 
a difference score main effect between Lists 
2and3(F = 6.32, df = 3/8, p < .05); and 
(d) the difference score main effect between 
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Lists 3 and 4 was nonsignificant. Second, as 
an overall estimate of transfer effects, each 
of the four sequences was taken as repeated 
measures and a 4 (A, B, C, D) X 4 (I, II, 
III, IV) analysis of variance was run. There 
were no main effects in this analysis, but the 
expected interaction finding did materialize 
(F = 2.72, df = 9/36, p < .05). 

Interlist variations were thus taking place, 
and careful inspection of Figure 2 provides 
evidence in support of the experimental 
hypotheses. The only condition in which 
association value performs in consonance 
with a nonspecific transfer suggestion is D 
to A (see Graphs II, III, and IV). In this 
case, the list succession moves from asso- 
ciation-value-no to association-value-yes; 
however, the reinforcement value succession 
of D to A is also moving in the predicted 
negative to positive (maximum positive 
transfer) direction. The A to B (Graphs I, 
III, and IV) and C to D (Graphs I, II, and 
III) sequences on association value predict 
negative transfer but are not consistently 
found to be so. The major confrontation of 
association value and reinforcement value 
comes within the B to C sequence of lists 
(Graphs I, II, and IV), where reinforcement 
value is moving from positive to negative 
and association value is moving from no to 
yes. In this case, association value predicts a 
positive and reinforcement, value a negative 
transfer across lists. Note that in all three 
instances there is a pronounced negative 
transfer taking place. 


Discussion 


It is of theoretical importance that we now 
have evidence for both positive and negative 
“nonspecific” transfer to parallel what is 
already accepted as fact for specific transfer. 
To use the A-B, C-D sequence as a *'con- 
trol," on the assumption that this can track 
only positive interlist learning effects, may 
be premature. The customary explanation 
for variations across tasks which the subject 
either likes or dislikes is that his motivation 
level fluctuates accordingly. The subjects of 
Experiment 3 might thus have simply been 
“trying harder" on those lists which they 
liked, and this accounted for the differential 
rate of learning across the four lists. However 
there was no reason (or reward") for these 


148 J. F. RYCHLAK, N. DUC TUAN, AND W. E. SCHNEIDER 


[21] 
20) 
[ET 
10] 
5 
o! t $ 
SUCCESSIVE PAIRED ASSOCIATE LISTS SUCCESSIVE PAIRED ASSOCIATE LISTS 
65 m KEY 6s 
60 aa 60 
55 55 
z 50 z 50 
x 45 FZU 
È E 
g 40) g 40 
3 35 zs 
< 
œ 30 a0 
8325 22s 
s 
n j" 
a 


s 1 2 3 4 : 1 E] 1 
SUCCESSIVE PAIRED ASSOCIATE LISTS SUCCESSIVE PAIRED ASSOCIATE LISTS 

FiGunE 2. The means (solid line) and standard deviations (broken line) of 16 subjects are presented 
inseparate graphs of 4 subjects each according to a different sequence of four paired-associate lists. (T he 
conditions I, II, III, and IV demonstrate the overall order of reaching criterion, and the association- 
value-reinforcement-value combinations to be found in a specific list along the abscissa are symbolized 
by A, B, C, and D. The standard deviations are scaled on the ordinate raw score units for ease in pres- 
entation. Abbreviations: AV = association value; RV = reinforcement value; pos. = positive; and 
neg. — negative.) 


AFFECTIVE ASSESSMENT AND NONSPECIFIC TRANSFER 


subjects to delay completion of the total 
experiment. Payment was made at the close 
of study. Interviews with the subjects fol- 
lowing the task convinced the experimenters 
that none were aware of the experimental 
manipulation. Though all trigrams were 
from the 70% nomothetic level of associa- 
tion value, the subjects definitely felt that 
some lists were more difficult than others. 
Experiment 1 counters any suggestion that 
the experimenters might have somehow pre- 
selected aberrant nomothetic trigrams for 
these lists. To argue that the subjects are 
motivated to try harder—and hence do 
better—on easier tasks results in a cir- 
cularity. 

Experiment 2 is clearly the stronger of the 
two studies on transfer, due to the perfect 
counterbalancing of conditions achieved. 
Here again, in light of the findings of Experi- 
ment 1, it does not seem profitable to begin 
speculating on covert nomothetie associa- 
tion value factors which somehow crept in to 
determine the results of this confrontation 
between idiographie association value and 
reinforcement value. It is disconcerting to 
traditional learning theory that a two-way 
transfer takes place across lists which have 
nothing observably “in common” (no stimu- 
lus or response similarities). Yet, this seems 
to be the case in the realm of affective as- 
sessment and—as suggested in the introduc- 
tion—it is what unites reinforcement value 
and formal discipline theory. Both concep- 
tions view the performing subject as having 
an active contribution to make to the learning 
process. 

This side of formal discipline theory is 
often overlooked. Due to the stereotype of a 
mentalistic ‘muscle training,” too many 
psychologists assume that the advocates of 
formal discipline believed that effort ex- 
pended in a task to develop ‘‘faculties” 
could be passively induced. Actually, the 
proponents of formal discipline held that a 
teacher could do nothing for the student, who 
had to bring about his own (faculty) develop- 
ment (see, e.g., Roark, 1895, p. 278). The 
analogy to muscles was drawn along “active, 
intentional use” lines and not along “power 
through practice ” lines. A student could be 
shown the way by the teacher and hopefully 
be aided when questions arose, but simply to 
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force meaningless repetitions on him in hopes 
of furthering his “mental muscles” was not 
a tenet of formal discipline theory. In this 
context, it is possible to see how a rein- 
forcement value effect across the less liked 
but dutifully performed Latin lesson might 
have seemed to generate by comparison a 
more talented performance in the better 
liked English or history lesson to follow. 
This reinforcement value sequencing of 
lesson content could actually be tested in a 
classroom situation. 

It is a challenge to psychology to find out 
precisely what constitutes judgments of 
“hard” versus “easy” material. Informal 
observation of students at all school levels 
suggests to the experimenters that an af- 
fective factor is surely prominent in such 
statements—one that is more than simply a 
motivational variation of the type discussed 
above. This performance differential is 
often tied. to conceptions such as "working 
at one's pace" or “self-selection of study 
topics” in the open classroom and so forth. 
It could be that the ultimate benefit of 
these so-called liberal classroom tactics is 
that they afford the student an opportunity 
to delve lightly into a disliked study topic 
(possibly by way of a teacher’s suggestion or 
example) and then to devote considerable 
subsequent effort to a preferred study task 
(thus maximizing positive nonspecific trans- 
fer for the latter, which continues to be 
“easy” as greater gains are made). Or, the 
student may devote his entire time to vari- 
ous liked subjects (a sequence of learning 
which also results in pronounced positive 
transfer). As the total school effort is ar- 
ranged to maximize facilitation of learning, 
the child may even come to reevaluate his 
formerly disliked subjects and place them in 
a more favorable light. 

To undertake an extended discussion of 
the theory which sustains reinforcement 
value research would go beyond the aims of 
the present paper.‘ Suffice to say, it is as- 
sumed that in the act of assessing items 
which are learnable (trigrams, words, pic- 


4 Requests for a complete theoretical statement 
of the "logical learning theory” underwriting 
reinforcement value research should be sent to the 
first author. 


150 


tures, etc.), the subject equates the affective 
assessment he makes of himself with the af- 
fective assessment he makes of these ma- 
terials—in a tautological fashion (liked item 
to liked self, or disliked item to disliked self). 
Such meaningful relations are thought to 
function “within time,” so that explanations 
relying on practice, rehearsal, learning how 
to learn, or warm-up are not considered 
viable accounts of the reinforcement value 
phenomenon. These concepts are best left to 
explain association value learning effects. 
Reinforcement value indicates what will be 
found meaningful—hence *known"—by the 
subject as he “comes at” a novel experience. 
Knowing an item in this sense means it will 
reach learning criterion in a practice series 
before the unknown items do. But this does 
not make the seeming acquisition over time 
a function of repetition. In one sense, and to 
a certain extent, the "acquisition" is made 
possible due to the preliminary grasp 
(knowing) which the subject brings to the 
situation beforehand as an affective assess- 
ment. 
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CONSTRUCT VALIDITY OF TEST ITEMS MEASURING 
ACQUISITION OF INFORMATION FROM LINE GRAPHS 


JAY R. PRICE,! VICTOR R. MARTUZA, anp JAMES H. CROUSE 


University of Delaware 


Research on the effectiveness of graphical displays for information 
acquisition and retention lacks a system for classifying graph infor- 
mation and generating test items to assess learning. The purpose of 
this study was to validate a system based on two types of informa- 
tion and three types of informational units. Results of an analysis 
of variance indicated differences in learning predictable from the 
classification system; however, a multitrait-multimethod matrix 
analysis failed to provide evidence of trait validity for the system’s 
informational constructs. In light of these results, a graph informa- 
tion-processing strategy was proposed in which subjects utilize data 


point information. 


The present study deals with the acquisi- 
tion and retention of quantitative informa- 
tion from a line graph stimulus. While the 
acquisition of quantitative information from 
graphical displays is an important com- 
ponent of school learning, the processes 
involved in such situations have been 
studied only infrequently, (cf. Schutz, 
1961; Washburne, 1927). The present study 
is particularly concerned with three aspects 
of learning from a line graph stimulus: (a) 
the nature of the informational unit(s) 
processed by subjects instructed to learn the 
information in the graph, (b) the relation- 
ship between the number of informational 
units upon which a test item is based and 
the accuracy of subject performance on that 
item, and (c) the relationship between study 
time and acquisition of information from the 
graph. 

In attempting to measure the acquisition 
of information from a line graph stimulus, 
the first question that arises concerns the 
nature of the informational units processed 
by the subject. A logical distinction exists 
between point and slope information. In a 
line graph, a unit of point information is 
the value of the dependent variable asso- 
ciated with a specific level of the indepen- 


1 Requests for reprints should be sent to Jay R. 
Price, College of Education, University of Dela- 
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dent variable; a unit of slope information is 
the change in value of the dependent variable 
per unit change in the independent variable 
associated with a specific, contiguous set of 
independent variable levels. The question of 
immediate interest is whether this logical 
distinction is a meaningful psychological 
distinction; that is, when instructed to 5 
learn the information in a line graph, do 
subjects encode point and/or slope in- 
formation? If subjects do, in fact, store 
point and slope information independently, 
then point and slope information can be 
viewed as informational constructs in much 
the same way that personality constructs 
are viewed; thus, it should be possible to 
validate items measuring these informa- 
tional constructs by means of multitrait- 
multimethod methodology (Campbell & 
Fiske, 1959). 

The second question of interest concerns 
the relationship between the number of 
informational units required for correct 
performance on items at recall and accuracy 
of subject performance on these items. 
Studies by Schutz (1961) and Washburne 
(1927) are tangentially related to this ques- 
tion, but because of differences in procedure, 
task instructions, and type of item presenta- 
tion format, the studies do not lead directly 
to expectations for the present experiment. 
However, it would seem that the greater the 
number of informational units required by 
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an item at recall, regardless of the type of 
unit involved, the poorer performance 
should be on the item. 

The third question of interest concerns 
the effects of study time on information 
acquisition. The purpose here was to extend 
the research on study time into the area of 
learning quantitative information from 
graphical materials. It was expected, as 
most studies have shown, that increased 
study time would result in greater acquisi- 
tion. Of greatest interest, however, were the 
possible interactions of study time with the 
type of informational units and with the 
‘number of informational units that were 
required for successful performance on the 
test items at recall. 


METHOD 


Subjects 


Thirty-six undergraduate education student 
volunteers served as subjects in this experiment. 


Materials 


A multiple-line graph was constructed in which 
the average value per share of stock for each of 
three fictitious companies was plotted for each of 
five. successive years. Each of the three lines (one 
per company) was generated randomly, subject to 
the following constraints: (a) One line would show 
an increasing trend; (b) the second line would 
show a decreasing trend; and (c) the third 
would show random fluctuations. To generate the 
data points for the first two of these lines, the data 
point values were randomly sampled from the 
following five strings of digits: 0-5, 1-6, 2-7, 3-8, 
and 4-9. For the increasing trend line, the first 
digit was randomly selected from the 0-5 interval. 
The next four digits were randomly selected from 
the four succeeding digit strings. For the decreas- 
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ing trend line, the first digit was randomly selected 
from the 4-9 interval. The next four were ran- 
domly selected from the remaining intervals in 
sequence. The five values for the third line were 
randomly selected from the 0-9 range subject to 
the restriction that there would be exactly one 
intersection or crossover of lines in the left, center, 
and right thirds of the graph. 

The criterion test consisted of six subtests of 
eight propositions each. Three subtests were 
based on point information, the rest on slope in- 
formation. Within each information type, the 
three subtests were based on a single unit of infor- 
mation, two units arranged vertically (i.e., the 
price of stock for two companies during the same 
year) and two units within the same line (i.e., the 
price of a single company's stock for two separate 
years). Following the lead of Anderson (1972), 
Bormuth (1970), and Cronbach (1971), basic sen- 
tence frames were formed for each item type (see 
Table 1), and rules were established to generate 
the items in each cell. 

For example, the rules for the point items based 
on a single unit of information are listed below: 

1. Company names for the eight items were 
selected randomly with the restriction that each 
company name was used at least twice and no 
more than three times. 

2. The year values for the eight items were 
chosen randomly with the restriction that each 
year value was used at least once and no more than 
twice. 

3. The comparative (greater than — less than) 
was assigned randomly to the items so that each 
appeared in four items of the subtest. 

4, Within the four items containing the 
“greater than” comparative, the truth value was 
randomly assigned such that two propositions 
would be true and two would be false. The same 
procedure was used for the four “Jess than" com- 
parative items. 

5. For each item, the set of stock values that 
would satisfy the truth value for that item was 
determined, and one element of the set was ran- 
domly selected for inclusion in the item. 

It is apparent from the above rules that items 


TABLE 1 


Trust Irem FRAMES: INFORMATION TYPE AND NUMBER OF INFORMATIONAL UNITS 


Two units within occasion 


‘Two units within group 


D Single unit 
Point | The average value per 
share of (name) stock 
was (--comparative) than 
(value) during the year 
(year). 
Slope | The average value per 


share of (name) stock 
(verb) during the pe- 
riod (years). 


The average value per share 
of (name) stock was (+com- 
parative) than the average 
value of (name) stock dur- 
ing the year (year). 

The average value per share 
of (name) stock changed 
(comparative) rapidly than 
the value of (name) stock 
during the period (years). 


The average value per share of 
(name) stock was (d-com- 
parative) in (year) than in 
(year). 


The average value per share 
of (name) stock changed 
(comparative) rapidly dur- 
ing the period (years) than 
during the period (years). 
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within each subtest were balanced for wording of 
comparatives (e.g., greater than — less than, more 
rapidly - less rapidly, increased — decreased) and 
truth value. With respect to wording of compara- 
tives, a number of researchers (e.g., Clark, 1970, 
Trabasso, 1971) have shown that positive and 
negative wording of test items impose different 
information-processing requirements on subjects 
with resulting differences in performance levels. 
These results as well as those on acquiescent re- 
sponding suggested that items should be balanced 
for comparative wording and truth value so that 
comparisons of interest would not be differentially 
contaminated by differences in responding. 

Analogous procedures were used for generating 
each of the five remaining item types. The items 
were then randomly ordered over the test as a 
whole, subject to constraints necessary for guaran- 
teeing that the distribution of the various item 
characteristics described above would be even 
across the test as a whole. 

The graph and test items were reproduced on 
standard 814 X 11 inch sheets of paper and bound 
in a seven-page test booklet. A cover sheet for 
subject identification was followed by the graph. 
A blank sheet followed the graph and separated it 
from the three pages of test propositions to pre- 
vent the subjects from seeing the graph at test 
Bins. A final cover sheet completed the test book- 
et. 


Procedure 


The subjects were randomly assigned in equal 
numbers to the 2- and 8-minute study time condi- 
tions. Following distribution of the materials, 
instructions were read to the subjects that (a) 
indieated the purpose of the study, (b) specified 
both the study time and test time limits, (c) in- 
formed them that the graph could not be used as a 
reference once the prescribed study time had 
elapsed, and (d) instructed them to answer all 
items. Subjects were told they had up to 40 min- 
utes to complete the test items. As it turned out, 
no one required more than 25 minutes to complete 
the test items, 
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RESULTS 


The number of correct responses per item 
type was determined for each subject. These 
data were then analyzed as a one-between, 
three-within factorial analysis of variance. 
The between factor was study time and the 
within factors were information type, num- 
ber of informational units, and wording of 
logical opposite pairs. Table 2 contains the 
means and standard deviations for this 
analysis. 

All four main effects were significant, 
while none of the interactions were sig- 
nificant. The mean score in the eight-min- 
ute study condition was higher than was the 
mean in the two-minute condition (F 
10.90, df = 1/34, p <.01). The mean score 
on point information items was significantly € 
higher than was the mean on slope informa- 
tion items (F = 6.18, conservative df = 
1/34, p <.02). Scheffé tests on the three 
information unit means indicated that the 
mean of single-unit items was higher than 
were the weighted means of the two unit- 
within-occasion and two unit-within-group 
items (p < .01); however, the means of 
the latter two item types were not sig- 
nificantly different from each other (p > 
.05). The mean performance on items stated 
positively (greater than, increase, more 
rapidly) was significantly higher than was 
the mean performance on items stated 
negatively (F = 6.16, conservative df = 
1/34, p < .02). 

To assess the relationship between per- 
formance on information types and number 
of data points required to answer an item 


TABLE 2 


MEANS AND STANDARD DEVIATIONS FOR THE STUDY TIME X INFORMATION Type X UNITS OF 
INFORMATION X WORDING INTERACTION 


Point Slope 
Study condition Onean agro win [Two wikin One unit Cops Two within 
+] -]+]-] +4} -]+]-] 4] -] 447 
Two minute 
x 3.22 | 2.89 | 3.06 | 2.61 | 3.11 | 2.72 | 3.17 | 2.94 | 2.56 | 2.67 | 2.44 | 2.28 
SD i 1.08} .99| .71] 1.11] .93| .87] 1.01] .91| .89| 1.00] .95 | 1.15 
Eight minute 
x 3.80 | 3.33 | 3.11 | 3.06 | 3.56 | 3.56 | 3.28 | 3.44 | 3.28 | 2.80 | 3.44 | 3.11 
SD .31| .82| .93 |1.03 | .60| .60| .81| .60| .86 | 1.15| .84| -98 
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TABLE 3 
COMPARISONS AMONG SUBTEST MEANS 
As Subtest 
No. ot aa Tora No. of units xX Subtest 
6 | 5 | 2 4 | 3 | 1 
4 Slope |2 within occasion | 5.639 6 55 .278  .778  .883 1.028* 
8 Slope 2 within group 5.694 5 2228185123; 4178. .973* 
2 Point | 2 within group 5.917 2 500  .555 150 
2 Slope 1 6,417 4 055 +250 
2 Point | 2 within occasion | 6.472 3 195 
1 Point 1 6.667 1 
tp = OL 


successfully, the six subtest means (In- 
formation Type X Number of Units) were 
analyzed as a one-between, one-within 
analysis of variance (Time X Subtest). The 
two main effects were significant; the in- 
teraction was not. The number of data 
points and subtest means as well as the 
significant comparisons by the Newman- 
Keuls procedure are contained in Table 3. 
This analysis indicated that only the mean 
of the point-single-unit test differed sig- 
nificantly from the means of the slope- 
within-occasion and slope-within-group 
tests. 

In order to assess possible effects of 
response sets, the data were reanalyzed 
with study time, logical opposite pairs, and 


TABLE 4 
MurTITRALT (Point AND SLOPE INFORMATION 
TYPE) - MULTIMETHOD (NUMBER OF 
INFORMATIONAL Units) MATRIX 


1 2 3 
Measure 
Point | Slope | Point | Slope | Point Slope 
1. Single unit 
Point (.42)» 
Slope .07 |(47) 
2. Within 
group 
Point .93 | .80 |(.48) 
Slope 1.00 | .62 |1.00*|(.87) 
3. Within oc- 
casion 
Point 15 |1.00*| .91 | .88 | (.48) 
Slope 445 | .76 | -20 | .21 | .31 |(.57) 


a Actual corrected values greater than 1. 

» Numbers within parentheses represent the 
reliability estimate according to the Kuder 
Richardson Formula 20. 


truth value as the independent variables. 
The only significant results were those 
main effects associated with study time and 
logical opposite wording. The fact that all 
interaction effects were nonsignificant 
seemed to rule out acquiescence as a possible 
explanation for the results obtained in the 
initial analysis discussed above. 

Table 4 contains the multitrait-multi- 
method matrix with number of informa- 
tional units representing the methods, and 
point and slope information being the pos- 
sible constructs. Correlation coefficients 
appearing in the table have been corrected 
for attenuation. The overall pattern of 
coefficients in the matrix does not support 
our hypothesis that the point and slope 
items included in this criterion test measure 
two distinct informational constructs. 


Discussion 


The results of the initial analysis in- 
dicated significant main effects for study 
time, wording, number of informational 
units, and informational types. The effect of 
informational types suggested that the 
point-slope dichotomy was a meaningful 
distinction; however, the multitrait-multi- 
method matrix failed to support this dis- 
tinction: Performance on the various point 
and slope subtests predicted performance 
on subtests both within and between these 
two informational constructs. 

An explanation for the disparate results 
of these two analyses may lie in the kind of 
information subjects encoded and/or re- 
trieved under the experimental instructions 
and conditions of this study. It is possible 
that subjects did not use slope information 
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as defined in this study but instead used 
only data point information. To answer 
slope items, subjects recalled point informa- 
tion and then constructed slope information 
from the recalled points. The reasoning that 
follows supports this conclusion. 

Slope items are apparently more dif- 
ficult than point items. If slope performance 
is a function of a subject's recall of data 
points, then an increase in the number of 
data points needed for successful per- 
formance should be accompanied by a de- 
crease in performance level. From Table 3, 
it is apparent that this inverse relationship 
exists; subjects’ scores tend to decrease as 
the number of data points increases. 

Consequently, it appears that the amount 
of data point information may be a more 
important factor than informational type 
in determining a subject’s performance 
level given the proposed information- 
processing strategy. However, the present 
findings do not rule out the possibility that 
under other experimental instructions and 
conditions, subjects might encode slope 
information. If this were the case, then the 
present multitrait-multimethod methodol- 
ogy would be suitable for providing evidence 
of the encoding of slope information and the 


J. R. PRICE, V. R. MARTUZA, AND J. H. CROUSE 


validity of the slope informational con- 
struct. 
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Books Geared to Educational 


— Psychology Today 


JANICE T. GIBSON, of the 
University of Pittsburgh on 


Educational 
Psychology, 
2nd Ed., 1972 


Provides programmed instruction 
for in-training and in-service 
teachers at all levels. Presents 
viewpoints of a psychologist 
experienced in research methods 
of psychology and in the 
application of these methods to 
classroom problems. 


This edition treats differences 
between people in abilities, 
aptitudes, and attitudes, and deals 
with problems involved in 
measuring these differences. 
Discusses theories of learning 
instruction together with evolving 
instructional procedures—from 
computer assisted instruction to 
informal procedures of the open 
classroom. 


Programmed instruction provides 
the reader with small steps, active 
participation, and immediate 
knowledge of results. 


Accompanying Instructor's Manual 
includes cross-referenced chart 
correlating chapters in this 

text with other chapters in major 
educational psychology books. 
1972, 350 pp., spiral-bound 

$7.15 (Formerly published by 
Appleton, Century, Crofts.) 


For further information, write: 
Robert Jordan, Dept. J-846 


JOHN P. DE CECCO, of California 
State University, San Francisco; 
and WILLIAM R. CRAWFORD, of 
PRIMEX Project, University of 
California, Los Angeles on 


The Psychology 
of Learning 


and Instruction: 
2nd Ed., 1974 


EDUCATIONAL PSYCHOLOGY-the 
successful first edition has been 
revised and updated to give a 
broader appeal to the student of 
the '70s. The new edition 

blends humanistic psychology 
with traditional, but retains its 
empirically-based educational 
practice orientation. Suitable for 
the basic course in educational 
psychology generally required 

in teacher-preparation programs. 


Includes new chapters on 
intellectual development of 
children using Jean Piaget's theory 
and uses modern linguistic and 
psycho-linguistic theory in the 
language development of 
children. Also offers new 

updated research and reference 
material, and improves the teaching 
practice of all four components 

of the basic teaching model. 


February 1974, 608 pp., cloth $10.95 


Also available—new and updated 
Student Guide, plus new and 
updated Teachers Manual 

both by Anne F. Terrill, University 
of Pittsburgh, Pennsylvania. 


Prentice-Hall, Englewood Cliffs, N.J. 


Sprinthall & Sprinthall | 


isn't really a book. 


It's a wide-angle lens. 


ants P 
sz <0 


s, 


Richard C. Sprinthall and Norman 
A. Sprinthall designed this remark- 
able lens. They call it EDUCATIONAL 
PSYCHOLOGY: A DEVELOPMENTAL 
APPROACH. 

It’s a lens that comprehensively surveys and others. 


summarize a textual 
maincurrent, pull 
together ideas and meth- 
ods via a single personality. 
Freud, Bruner, Skinner, Piaget 


the ed psych landscape. Where ed psych It's a visual experience. Made cogent by 
came from. Where it is. Where it can go. It the Sprinthalls' writing style. Lucid. Com- 
puts the pieces together and makes them fortable. Often entertaining. 
workable. One who scrutinized it had this to say: 
The focal point is the student. His “Tf the book were published tomorrow, I 
physical, cognitive, emotional and moral would use it the very next time I taught 
development. From birth through educational psychology. It is the best 
adolescence. all-round book I know of.” 
In the background and interwoven We agree, of course. Although we really 
throughout the scene are major figures think it’s more like a lens. 


capsuled in a series of vignettes. They 
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College Division 
ADDISON-WESLEY PUBLISHING COMPANY, INC. 
Reading, Massachusetts 01867 
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Allyn anc 
Bacon Inc. 


New 1974 

Contemporary Issues in Educational Psychology, 

Second Edition. Edited by Harvey F. Clarizio, Robert C. Craig, and 
William A. Mehrens, all of Michigan State University. 1974, 6%x9%, est. 
756 pp. 

A Sympathetic Understanding of the Child: Birth to Sixteen 

By David Elkind, University of Rochester. 1974, 514 x84, paperbound, 
est. 216 pp. 

Teachers and Learners: The Interactive Process of Education, Second 
Edition. By Alfred H. Gorman, Montclair State College. 1974, 5Y2x8Y2, 
paperbound, est. 228 pp. 

Teaching/Discipline: A Positive Approach for Educational Develop- 
ment, Second Edition. By Charles H. Madsen, Jr. and Clifford K. Madsen, 
both of Florida State University. 1974, 536x835, paperbound, est. 150 pp. 


Educational Psychology, Fourth Edition 
By James M. Sawrey and Charles W. Telford, both of San Jose State 
College. 1973, 7x9%, paperbound, 706 pp. 


Human Dynamics in Education and Psychology: Selected Readings, 
Second Edition. Edited by Don E. Hamachek, Michigan State University. 
1972, 556x836, paperbound, 667 pp. 


Allyn and Bacon, Inc., College Division, Dept. 893 
470 Atlantic Avenue, Boston, MA 02210 


* An intellectual adventure 
of the first order." * 


JEAN 
PIAGET 


The Man and His Ideas 


Richard I. Evans 


A DIALOGUE WITH PIAGET 
plus illuminating articles by 
Jean Piaget, David Elkind and others 


IN THIS DIALOGUE 

Intelligence Resting 

Applying His Ideas to Education 
Comparisons with Freud, Skinner and Erikson 


The Intellectual Development of the Child 


Through its innovative dialogue format. 
comprising a series of structured but in- 
formal interviews with Jean Piaget. this 
book provides both the student and gen- 
eral reader with a clear, undistorted under- 
standing of Piaget’s theories. Piaget also 
gives his views on Gesta psychology. 
psychoanalysis, cybernetics and the work 
of leading contemporary psychologists. 


“Throughout the man himself appears, 
speaking eloquently, a great teacher as 
he undoubtedly is...informative, inspir- 


i d lary.” 
Les que. aua HY Los Angeles Times 


“The bibliography is outstanding; and 
the unexpected and most welcome 
bonus of Piaget's autobiography—much 
in the manner of Henry Adams—has 
been sorely needed." __ 

—Library Journal 
vam $8.95, cloth 
Iafdutton $2.95, paper 


Key textbooks 
for learning about children and adolescents 


The Adolescent Years 

Second Edition 

WILLIAM W. WATTENBERG, wayne State University 
The Second: Edition of this successful textbook for courses in adolescent 
psychology continues to offer a thorough, well-documented analysis of all 
phases of adolescent development. Written in an informal, highly readable 
style, the book presents adolescence as a developmental stage, describing 
how childhood experiences alter adolescence and showing in general what 
happens during pre-adolescence, adolescence, and the years of near- 
adulthood. The book is effectively organized into four major parts or cycles: 
Part A presents an overview of general adolescent characteristics; Part B 
deals with influences than can modify the general pattern; and Part C is 
concerned with behavioral settings in the adolescent life—among them, the 
home, the school, and the community. Finally, Part D takes up the major 
problems in adolescent life, including discussions of money, sex, delin- 
quency, drugs, and careers. Throughout the book the author illustrates the 
text material with case studies, often presented in the form of vignettes. 

458 pages. $10.95 


Readings in Child Behavior 

and Development 

Third Edition 

Edited by CELIA STENDLER LAVATELLI, University of Illinois, 

and FAITH STENDLER, Massachusetts Mental Health Center 
Widely adopted and highly praised for its content and currency, the new 
Third Edition of Readings in Child Behavior and Development presents 4€ 
important articles reflecting the most recent and significant research anc 
thinking in the field. Each of the six parts has a detailed introduction tha 
surveys what is important in the various domains of child development. The 
first part, “The Grand Systems," provides a solid theoretical foundation fo: 
approaching the individual papers and includes articles by Erikson, Skinner 
Piaget, Werner, Anna Freud, and Hess. The book then goes on to stress 
such major contemporary concerns as cognitive development, infancy, the 
pre-school child, cross-cultural differences, and the disadvantaged in classic 
papers by noted psychologists. Paperbound. 529 pages. $6.95 


Understanding Children 

BEHAVIOR, MOTIVES, AND THOUGHT 

JEROME KAGAN, Harvard University 
“This small book is a gem. It accomplishes two major, important, and often 
incompatible objectives. It develops significant scientific concepts, ideas, 
and information for non-scientific audiences—in this case teachers, parents, 
and other adults interested in and puzzled by the young. It also provides an 
informal but coherent outline of cognitive theory, a theory which, in this 
instance, incorporates motivation and emotion as well as concept attainment 
and thinking. . .. Here is a psychological frame of reference which humanistic 
educators, and many others, can use to array their ideas, understandings, and 
energies as they engage in pressing problems of parenting, guiding, teach- 
ing, and counseling children and youth.” Paperbound. 153 pages. $3.50 


From a review by Dale B. Harris, Pennsylvania State University 
in Contemporary Psychology, 1972, Vol. 17, No. 6 


E HARCOURT BRACE JOVANOVICH, INC. 
New York / Chicago / San Francisco / Atlanta 
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CHANGING CLASSROOM BEHAVIOR, 2ND ED. 
Merle L. Meacham, University of Washington 
and Allen E. Wiesen 


The authors advocate "objective teaching," a method that 

utilizes reinforcement, extinction, imitative learning 

and programming—and show how their techniques can help the 

child by giving him skills that give him some control over 

his environment and his behavior. Many practical illustrations 

show how to use these and related techniques in class. 
February, 1974 288 pages paper, about $4.95 


STATISTICAL METHODS FOR BEHAVIORAL SCIENCE 


Billy L. Turney, Tyler State College 
and George P. Robb, North Texas State University 


A thorough introduction to the concepts and procedures for 

both descriptive and inferential statistics, including 

problems that supply feedback to facilitate problem-solving. 

Exercises following each topic have their answers printed on 

the reverse page, so students can check themselves as they read. 
1973 214pages paper, $6.95 


THE ADOLESCENT: Case Studies for Analysis 


Nancy C. Ralston and G. Patience Thomas, 
both of the University of Cincinnati 


After an introduction to the case study method and a 
summary of basic psychological theories about learning, 
development, and behavior, the authors present nine case 
studies on the varied problems of adolescence—including 
one on suicide and one on minority groups. 

March, 1974 208 pages paper, about $4.95 


THE CHILD: Case Studies for Analysis 


Nancy C. Ralston and G. Patience Thomas, 
both of the University of Cincinnati 


Eighteen case studies with interpretive questions that 
will help students develop a broad understanding of some 
of the typical behavior patterns of childhood, along with 
insight into the causes and meaning of different kinds of 
behavior. A sample case study outline is included. 

1972 200pages paper, $3.95 


INTEXT EDUCATIONAL PUBLISHERS - 257 Park Avenue South: New York, NY 10010 


I STOP-—LOOK |! 


BUILDING SKILLS FOR 
COMPETENCY-BASED TEACHING 


Leo D. Leonard and Robert T. Utz 


Here is the first book to develop the basic skills for building a competency-based curriculum by using 
its own methodology as a model. It begins with an explanation of the characteristics of competency- 
based education and a pretest covering the basic skills. For each type of skill—developing student 
self-discipline, applying learning concepts, formulating educational objectives, developing a cur- 
riculum, and evaluation—the relevant behavioral objectives are identified and discussed using 
numerous sample problems, and the reader is guided in evaluating himself. Finally, the book asks the 
student to synthesize the basic skills and to develop a competency-based curriculum for his own 
particular area of teaching. Charts; glossary; chapter bibliographies. February 1974. Tentative: 248 


pages; $5.95/paper. 


COGNITIVE PROCESSES 
IN EDUCATION: 


A Psychological Preparation for Teaching 
and Curriculum Development 


Sylvia Farnham-Dig gory 


Devoted exclusively to the study of psychology as it applies directly to the methods and problems of 
American education, this book explains why such recent developments as the new math and the new 
social studies are psychologically more meaningful than traditional programs. Relevant research 
findings are compiled and used to formulate a psychological rationale for better education. Case his- 
tories illustrate the author’s development of such themes as impulsivity, dependence, prejudice, and 
rigidity in the classroom, Actual curriculum examples and materials illustrate explanations. Published. 
630 pages; $12.50. Instructor's Manual. Study Guide: $1.95. 


LEARNING THEORIES FOR TEACHERS, 


Second Edition 
Morris L. Bigge 


A balanced presentation of early and contemporary learning theories and their implications for teach- 
ing, this book avoids oversimplification of the basic tenets and describes similarities and differences 
among theories to guide the student in critically constructing and evaluating his own outlook on the 
nature of learning, Discussion of the strengths of each theory is designed to confront the reader with 
the need to choose among competing psychological points of view. Weaknesses are left for the reader 
to discern. Published. 358 pages; $5.35/paper. 


Harper & Row 


offers these texts and many 
more=> 


FOR A VARIETY 


PSYCHOLOGICAL CONCEPTS 
IN THE CLASSROOM 
Richard H. Coop and Kinnard P. White 


Organized around a set of scenarios, this text describes three types of schools in American 
society—an inner city school, a small town-rural school, and a school in a wealthy suburban 
area—and presents case studies drawn from each. Eight specific conceptual areas are treated: 
power and influence in the classroom, self-concept, teaching expectancy, behavior modifica- 
tion, locus of control, motivation, intelligence, and information processing. Each chapter 
presents an overview of the research in a given area and relates the evidence directly to the 
school scenarios or the individual case studies. Problems and study questions; illustrations; 
and bibliography. February 1974. Tentative: 320 pages; $4.95 /paper. 


LOOKING IN CLASSROOMS 
Thomas L. Good and Jere E. Brophy 


The need for teachers to become more aware of their classroom behavior is the central theme 
of this book. It surveys recent advances in educational research and gives detailed advice 
about effective teaching. Among the subjects covered: Teachers' attitudes and expectations, 
the teacher as a behavioral model, classroom organization and management, ability grouping, 
and peer tutoring. Published. 397 pages; $4.95 /paper. 


MAINTAINING SANITY 
IN THE CLASSROOM: 
Illustrated Teaching Techniques 


Rudolf Dreikurs, Bernice Bronia Grunwald, and 
Floy C. Pepper 


Based on the psychology of Alfred Adler and heavily case-oriented, this text applies tech- 


niques of motivation modification to classroom problems, The authors systematically utilize 


their own case material to illustrate teaching problems and how to deal with them. Published. 
338 pages; $5.35/paper. 


Harper & Row 


10 East 53d Street, New York 10022 


OF EDUCATIONAL 
PSYCHOLOGY 
COURSES 


STATISTICS FOR EDUCATION: 


With Data Processing 
David White 


This introductory statistics text gives education majors actual experience in using statistical 
techniques to solve educational problems. The inclusion of raw educational data from two 
school districts familiarizes students with the common characteristics of such data and pro- 
vides them with experience in interpreting evidence for or against meaningful propositions. 
The book features material on the use of high-speed data processing facilities, sampling 
methods, and decision-making. Solutions to problems and exercises are presented. Just pub- 
lished. 382 pages; $10.95. 


PROBLEM SITUATIONS IN TEACHING 


Gordon E. Greenwood, Thomas L. Good, and 
Betty L. Siegel 


This text provides a decision-making view of teaching through 20 representative case studies, 
developed from data on over 330 problem cases. A strong, detailed introduction suggests a 
strategy for attacking teaching problems. The cases, each reported behaviorally and left un- 
resolved, allow for original student analysis and solutions. Stimulus questions following each 
case serve to provoke and guide discussion. Instructor's Manual. Published. 162 pages; 
$3.95 /paper. 


PSYCHOLOGICAL FOUNDATIONS 
OF EDUCATION, 


Second Edition 
Morris L. Bigge and Maurice P. Hunt 


This problem-oriented text takes a semihistorical and comparative approach in discussing 
how children develop through adolescence, how they learn, the relationship between de- 
velopment and learning, and how knowledge of this relationship promotes effective teaching. 
Cognitive-field theory is contrasted with S-R associationistic and other views. The text differ- 
entiates among schools of thought in psychology to offer a choice of teaching methods and 
to provide a basis for developing cognition and intellectual procedures in the classroom. 
Classroom application of psychological theory is emphasized. Schematic line drawings; case 
studies; references; annotated bibliographies. Instructor's Manual. Published. 603 pages; 
$11.50. 


CHILD DEVELOPMENT 
AND PERSONALITY, 
Fourth Edition 


Paul Henry Mussen, John Janeway 


Conger, and Jerome Kagan 

736 pp. Tentative: $11.95; March, 1974. 
For use with the text: New Instructor's Manual; 
New Study Guide by Fay-Tyler Norton; New 
PSI Study Guide by Ray DeV. Peters; New 
PSI Instructors Manual by Ray DeV. 
Peters. 


BASIC STATISTICAL 
METHODS, 
Fourth Edition 


N. M. Downie and 
Robert W. Heath 


355 pp.; $10.95 (tentative); March, 1974. Study 
Guidebook. 


CHILD DEVELOPMENT: 
The Human, Cultural, and 
Educational Context 


W. H. O. Schmidt 
181 pp.; Paper: $3.50; July, 1973. 


MORE TEXTS 


from Harper & Row f. 


Send for our 1974 educational psychology catalog (CT08) 
Harper & Row/10 East 53d Street, New York 10022 


ADOLESCENCE 

AND YOUTH: 
Psychological Development 
in a Changing World 


John Janeway Conger 


573 pp.; $10.95; June, 1973. Instructor’s 
Manual. 


ESSENTIALS OF PSYCH- 
OLOGICAL TESTING, 
Third Edition 


Lee J. Cronbach 
752 pp.; $11.75; 1970. 


EXERCISES IN PSYCH- 
OLOGICAL TESTING 
Eugene R. Oetting and 


George C. Thornton III 
229 pp.; Paper: $5.25; 1968. 


UNDERSTANDING 
SCHOOL LEARNING: 
A New Look at 
Educational Psychology 


Michael J. A. Howe 
299 pp.; Paper: $3.50; 1972. 
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* The only thing better 
than Biehler 


The Second Edition of Robert Biehler's Psychology 
Applied to Teaching is the only introduction to educational 
psychology that could be better than the First Edition. 

Because the new Biehler has all the best features of 
the old, and then some. 

The Second Edition is as readable and as teachable as 
ever, and Biehler preserves the definitive approach of the 
First Edition — applying psychology to actual classroom situations. 

But the new Biehler has been revised and updated, too. 

The text now covers performance contracting, accountability, 
free schools, open education, inequality of educational opportunity, 
and the Nature-Nurture Debate. Biehler also expands coverage 
of the theories of Bruner, Piaget, Erikson, and Maslow. 

The Second Edition of Psychology Applied to Teaching. 

It's the only thing that could be better than the First Edition 
of Biehler. 


And nothing is better than a better Biehler. 


Psychology Applied 
to Teaching 
Second Edition 
Robert Biehler 


California State University at Chico 
795 pages 


with SELECTED READINGS (1972), 
and new Study Guide and 
Instructor's Manual / January 1974 


- is a better Biehler 


FI «l| Houghton Mifflin Boston 02107 / Atlanta 30324 / Dallas 75235 
^ Geneva, Ill. 60134 / Hopewell, N.J. 08525 / Palo Alto 94304 


PSYCHOLOGY: 


focus on the learner 


Lita Linzer Schwartz, The Pennsylvania State University, Ogontz Campus 


COMPREHENSIVE: The text includes the topics traditionally covered in begin- 
ning educational psychology courses; the author's ability to weave interesting 
examples into the text heightens motivation and makes the principles more 
concrete. 


FOCUS SEGMENTS: Focus segments clearly illustrate case studies, experi- 
mental research, and practical applications of theoretical principles. 


TIMELY: Innovative topics such as educational accountability and the British 
approach to primary education are included. 


TEACHING AIDS: Outlines, glossaries, bibliographies, and charts clarify the 
information contained in the chapters. An instructor's manual is available 
upon adoption. 1972, 7% x 914, 604 pp. 


HOLBROOK PRESS, IN 


Dept. A93, 470 Atlantic Ave., Boston, MA 02210 
C. (a subsidiary of Allyn and Bacon, Inc.) 


EDUCATIONAL PSYCHOLOGY: The Science of Instruction 
and Learning 

By Ricuarp C. ANDERSON, Director, Training Research Labora- 
tory, University of Illinois, and GrRAwp W., Fausr, Associate 
Director, Institute for Computer Uses in Education, Brigham 


Young University. 528 pages. Paperback. $7.95. Study 
Guide, $2.95. Instructor’s Manual. 


Everyone who is seriously concerned with the improve- 
ment of instruction should read this practical effort to bridge 
the gap between recent psychological research and classroom 
applications. A textbook that practices what it preaches, it 
incorporates a variety of instructional techniques such as 
behavioral objectives, self-testing, and self-instructional pro- 
grams. This experiential dimension, the authors believe, is 
what is most likely to increase the probability of good teaching. 


DODD, MEAD & COMPANY, 


79 Madison Avenue, New York, New York 10016 


From Macmillan— 


The Current Topics in 
Classroom Instruction Series 


All by Norman E. Gronlund, University of Illinois 


COMING IN 1974 
INDIVIDUALIZING CLASSROOM INSTRUCTION 


Surveys many of the comprehensive systems of individualized instruction cur- 
rently in use across the country, describes some of the common classroom proce- 
dures and techniques, and provides sources of information on other programs 
currently being developed. 

1974 approx. 64 pages paper prob. $1.50 


DETERMINING ACCOUNTABILITY 

FOR CLASSROOM INSTRUCTION 
A practical guide to understanding accountability, its principles and practices, 
that enables in-service and prospective teachers to more effectively participate in 


school accountability programs. 
1974 approx. 64 pages paper prob. $1.50 


IMPROVING MARKING AND REPORTING 
IN CLASSROOM INSTRUCTION 


1974 approx. 64 pages paper prob. $1.50 


PREVIOUSLY PUBLISHED 


PREPARING CRITERION-REFERENCED TESTS 
FOR CLASSROOM INSTRUCTION 
Shows teachers how to prepare and use criterion-referenced tests. Such tests 
are designed to interpret specific types of learning achievement and to yield pre- 


cise descriptions of the student's success in learning. 
1973 55 pages paper $1.50 


STATING BEHAVIORAL OBJECTIVES 
FOR CLASSROOM INSTRUCTION 


Trains students to state objectives for classroom instruction in terms of ex- 
pected behavior outcomes. The author describes the methods used in identifying 
and defining the instructional objectives as learning outcomes. 

1970 58 pages paper $1.65 


MACMILLAN PUBLISHING CO., INC. 
100A Brown Street, Riverside, New Jersey 08075 


Important Macmillan 


EDUCATIONAL PSYCHOLOGY 


A Scientific Foundation for Educational Practice 
Robert M.W. Travers, Western Michigan University 


Following an introductory discussion of the need for scientific information 
to solve problems in education, Educational Psychology follows the develop- 
ment of learning from birth through the school years. It features: 

* Classroom examples that illustrate the scientific principles underlying 
learning and human development, achievement and motivation, reten- 
tion, social development, and pupil assessment. 

* Consideration of such special problems as those of transfer, evaluation, 

and appraisal of learner potential. 

Detailed chapter summaries to facilitate study and review. 

* Many examples of teacher-pupil interactions. 

An accompanying teacher's manual that contains over 300 multiple-choice 

test items. 


1973 448 pages $9.95 


MEASUREMENT AND EVALUATION IN TEACHING 
Second Edition 
Norman E. Gronlund, University of Illinois 


This book introduces teachers and prospective teachers to the principles 
and procedures of measurement and evaluation that are essential to effective 
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carried out; and how the written report is presented for maximum effectiveness. 
The book is clear and to the point. It spells out much that has been taken for 
granted in other research texts. The author includes many unique learning 
aids to enhance and further clarify the textual material. They include a sample 
dissertation proposal in the appendix that has marginal notes indicating the 
proper form of presentation; and a practicum in the appendix that allows the 
student to again experience each phase of the textual coverage by actually 
planning and presenting a research proposal. 

1974 approx. 256 pages paper prob. $4.95 
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whole: the study of human behavior. 
Significant features of the Second Edition: 
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SEX BIAS IN THE EVALUATION OF PROFESSIONAL 
ACHIEVEMENTS' 


HARRIET N. MISCHEL 


Stanford University 


| Ina series of studies, U.S. and Israeli subjects were presented with a 
| set of journal articles from diverse fields. The independent variables 
were the sex of the ascribed author (eg., John R. Simpson, Joan R. 
| Simpson), the male versus female association of the field in which the 
article was written (e.g., law versus primary education), and the sex 
and age of the subjects who were asked to evaluate each article. In the 
first study, U.S. male and female high school and college subjects 
showed sex bias in evaluating the journal articles by rating the same 
article more favorably when it was attributed to an author of one sex 
rather than the other sex. Judges tended to prefer authors whose sex 
was the same as that normative for (or strongly associated with) the 
professional field in which the article was written (e.g., a female author 
in dietetics, a male author in city planning). In a cross-cultural fol- 
low-up study, Israeli subjects did not show the evaluative biases found 
in the US. sample. Nevertheless, they had the same stereotypes as 


the US. subjects regarding the sex association of the diverse fields. 


The evidence regarding the nature and 
extent of psychological sex differences leaves 
considerable room for diverse interpreta- 
tions (e.g, Maccoby, 1966; Mischel & 
Mischel, 1971). But whether or not men and 
women differ substantially on many psy- 
chological dimensions and whether these 
differences are innate, immutable, or merely 
the residue of social learning, the evalua- 
lion of men and women seems to differ 
considerably (Goldberg, 1967; Gray-Shell- 
berg, Stone, & Villareal, 1972; Horner, 1969; 
Pheterson, Kiesler, & Goldberg, 1971) and 
is reflected in highly controversial charges 
of rampant sexual biases that have enor- 
mous social implications (e.g., Millet, 1970). 
Unfortunately, the scope and nature of sex 
bias have been subjected more to polemics 
than to objective research. The present 
studies investigated one aspect of sex bias: 


1 Thanks are due to Margaret Spencer and Tom 
Upton for help in data collection in Study 1 and 
to Judy Lansing for data collection in Study 3, to 
Antonette Zeiss for assistance with statistical anal- 
yses, and to Mordecai Nissan for facilitating ac- 
cess to the Jerusalem city population participating 
in Study 3. 

Requests for reprints should be sent to Harriet 
N. „Mischel, Department of Psychology, Stanford 
University, Stanford, California 94305. 


the tendency to differentially evaluate 
achievements on the basis of the sex of the 
performer. 

Prejudicial evaluation of women’s work 
has been offered as an explanation for the 
apparent failure of women to achieve as 
much success as men (Klein, 1950; Schein- 
feld, 1944). A study by Goldberg (1967) 
sought to investigate the prejudice of women 
against other women in the areas of intellec- 
tual and professional competence. Taking 
Allport’s (1954) definitiop of prejudice as 
the distortion of perception and experience, 
Goldberg hypothesized that when con- 
fronted with an identical piece of work, 
women would value the professional work of 
men more highly than that of women. Gold- 
berg reported a general trend on the part of 
his female subjects to devalue females. 
Women tended to rate a professional jour- 
nal article more highly when it was attrib- 
uted to a male author than when it was 
attributed to a female author, thus showing 
bias against their own sex. 

Goldberg (1967) further hypothesized 
that when the professional field was one 
traditionally reserved for women (e.g., 
primary education, dietetics), the tendency 
of women to devalue women would be 
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lessened or even reversed. However, Gold- 
berg reported that even in these fields, 
women were prejudiced against other 
women. Goldberg’s finding of antifeminism 
among women is sometimes cited as if it 
were definitive, but a closer examination of 
the data he reported limits their significance. 
Only one out of the six fields evaluated 
showed a difference (ie., preference for a 
male author) at less than the .05 significance 
level by a one-tailed test and this was for 
city planning, an occupation strongly asso- 
ciated with men for Goldberg’s population. 
Subsequent studies also have cast some 
doubt on the generality of Goldberg’s find- 
ing. For example, Pheterson (1969) found 
that sex bias was absent in a group of 
middle-aged uneducated women. Pheterson 
et al. (1971) found that female college stu- 
dents from the same population as Gold- 
berg’s subjects rated paintings attributed to 
male art-contest entrants more favorably 
than the identical paintings attributed to 
female entrants. However, when the sub- 
jects were told that the artists had been 
awarded prizes, they rated men and women 
equally. 

Goldberg’s (1967) data were provocative 
but the conclusions that may be drawn from 
them are limited not only by the weakness 
of his findings but by the fact that his sub- 
jects were restricted to female college stu- 
dents. What should be expected from men? 
Goldberg’s most surprising result was that 
college women tended to be biased against 
other women even in traditionally feminine 
fields. Would men share this bias? Gold- 
berg's study was executed in the mid-1960s. 
What changes could be expected as a result 
of increasing awareness, especially among 
young adults? Presumably there are cul- 
tural limits on the biases that Goldberg 
reported. More elucidation is needed on the 
parameters of these limits. For example, how 
much of an antibias factor is college educa- 
tion? Would the obtained biases be similar 
in a social structure different from our own? 

The present studies were designed to 
investigate the foregoing questions using 
articles from the same fields as Goldberg 
had and using the same evaluative format 
that he had employed. Measures of sex 
prejudice and sex stereotypes were admin- 
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istered to high school and college students 
as well as some older adults. Male and 
female subjects were included in each age 
group so that possible sex differences in any 
prejudice could be noted. The measures 
were given to a sample of Israelis living on 
a kibbutz, to one of Israelis living in a large 
city (Jerusalem), and to U.S. subjects. 


Srupy 1 


In the first study, it was hypothesized 
that in traditionally masculine fields, both 
men and women would show bias against 
women. Thus it was expected that men 
would show the same preference for male 
authors in traditionally 
fields as did the women in Goldberg’s study. 

It was further hypothesized that men 
(unlike the women in Goldberg’s study) 
would actually be biased in favor of women 
but only in certain restricted, traditionally 
feminine areas of work. It was reasoned that 
by acknowledging certain areas of profi- 
ciency for women, men can demonstrate 
their ostensible fairness while actually ex- 
cluding or derogating women in areas 
reserved for males. Consider, for example, 
the idea that women are more sensitive in 
matters involving emotions and feelings, 
whereas men are superior in reasoning and 
intellectual matters, or that women are ex- 
perts in keeping home and hearth, while men 
are best equipped to handle the affairs of 
the world (as President Richard M. Nixon 
commented to Premier Chou En-lai during 
his 1972 visit to Peking). Such notions may 
reflect a kind of “separate but equal” delega- 
tion of abilities wherein women are allowed 
special “female” expertise but are devalued 
in traditionally masculine domains. 


Method 


Subjects and experimenters. Subjects were 28 
high school students (14 male and 14 female) and 28 
college students (14 male and 14 female). The high 
school students were from the senior class of a large 
public high school drawing from a varied middle- to 
lower-middle-class population in Sunnyvale, Cali- 
fornia. The college students were Stanford Univer- 
sity undergraduates. The same female Stanford 
undergraduate served as the main experimenter for 
all. subjeets. A male Stanford undergraduate 
assisted in administering the experimental proce 
dure to the Stanford undergraduates. 

Procedure. Consistent with Goldberg's (1967) 


male-associated 
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procedure, one current article was selected from. 
the professional literature of each of four occupa- 
tional fields. The fields were law, city planning, 
primary education, and dietetics. Goldberg's study 
included a preliminary rating of numerous occupa- 
tional fields for their degree of association. with 
men or women. He found that law and city plan- 
ning were strongly associated with men and that 
primary education and dietetics were strongly 
associated with women. Thus, the articles in the 
present study were assumed to represent two ex- 
ied masculine and two extremely feminine 
fields. 

The articles were edited and abridged to approxi- 
mately 700 words each and were combined into 
booklets. Fictitious authors’ names appeared on 
the first page of each article. The critical experi- 
mental manipulation involved the sex of the author 
as conveyed by the Christian name. Each booklet 
contained all four articles but, for any one article, 
half the booklets had a male author's name and 
half had a matched female authors name. Only 
the first name differed, e.g., John R. Simpson, Joan 
R. Simpson. Each booklet had two “male- 
authored" articles and two “female-authored” 
articles. Printed instructions stated: 


In this booklet you will find excerpts of four 
articles written by four different authors in four 
different professional fields. At the end of each 
article you will find several questions which are 
to be answered before you proceed to the next 
article. You are not presumed to be sophisticated 
or knowledgeable in all the fields. We are in- 
terested in the ability of high school (college) 
students to make critical evaluations of profes- 
sional literature. 


At the end of each article, the subjects answered 
a set of nine evaluative questions. These ques- 
tions were the same for all subjects and for all 
articles, except of course for the authors name* 


* An interesting problem arose with regard to 
the inclusion of the authors’ names in the evalua- 
tive questions. Goldberg’s (1967) procedure was 
followed as closely as possible since one of the 
purposes of the study was to replicate and extend 
his findings. Goldberg reproduced only one set of 
questions and in that the reference was to @ male 
author. Goldberg refers to him as, for example, 
Mr. Simpson. This left in doubt the title to use 
when the article being evaluated by the judges 
had been attributed to a female author. It was felt 
that Mrs. or Miss might have added an extra 
evaluative dimension that could be difficult to 
assess; that is, it gives additional information 
about the author which might influence an assess- 
ment of competence. The use of Ms. was rejected 
on the ground that this title is too closely asso- 
ciated with the women's rights movement and so 
might unduly sensitize the judges to the dimension 
Of interest. The problem was resolved by using 
the ascribed author's first and last names only, for 
example, John R. Simpson and Joan R. Simpson, 
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which was included in each question. Each ques- 


tion, in the set of nine questions, contained five 


response choices, as follows: 


Based on this article, what would you judge S 


's (name of author) professional com- 
petence to be? 
1. extremely competent, 2. above-average com- 
petence, 3. average competence, 4. below-average 
competence, 5. incompetent. 

If you were to assign a grade to 
(name of author) article, what would it be? 
1. A, 2. B, 3. C, 4. D, 5. F. 

In her (his) appeal, was ———i- (name of 
author) 
1. very rational, 2. rational, 3. emotional, 4. ex- 
tremely emotional, 5. alarmist. 


"s 


The remaining six questions requested evalua- 
tion of the author's professional status, the value 
of the article for the general reader, the effective- 
ness of the author's writing style, and various 
aspects of the impact. of the article on the reader 
(e.g. the extent to which the readers’ opinions on 
the issues discussed were swayed by the article). 
For each individual, the scores for all nine rating 
questions were summed for each of the four fields. 
Thus 9 was the lowest (most favorable) possible 
score and 45 was the highest (most critical) pos- 
sible score. 


Results 

As a first step, preliminary t tests were 
computed to compare male and female 
judges’ ratings of the articles in the various 
possible author-field combinations. The re- 
sults showed no sex differences approaching 
significance. Therefore, male and female 
subjects were combined in the following 
analyses. 

The mean ratings of articles in each con- 
dition are shown in Table 1. Note that 
higher ratings indicate more negative evalu- 
ations. These data were submitted to a 
2 x 2 x 2 analysis of variance for repeated 
measures to test the effect of sex of fields 
(male and female), sex of authors, and 
grade level of subjects serving as judges 
(high school and college) on the ratings 
given to the articles. The results (Table 2) 
revealed a highly significant Sex of Field X 
Sex of Author interaction. Inspection of the 
means in Table 3 helps to clarify this inter- 
action. The means indicated that articles 
attributed to male authors were preferred in 
the male fields (law and city planning) 


the Mr., Mrs., and Miss titles al- 


and omitting 
together. 
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: TABLE 1 
Mean RATINGS OF ARTICLES IN EACH. CONDITION 
(Srupx 1) 
Female author Male author 
Subjects 
Female | Male | Female | Male 
field feld | field field 
College 19.50 | 23.93 | 22.93 | 21.75 
High school 24.18 | 24.46 | 23.64 | 23.04 


Note. Higher numbers indicate more negative 
ratings throughout the present studies. 


while artieles attributed to female authors 
were preferred in the female fields (dietetics 
and primary education). 

The analysis of variance also showed a 
main effect for grade level. High school 
judges gave lower ratings to all the author- 
field combinations (Table 1), indicating 
that they were more critical of the articles 
than the college sample. 

"There was a significant overall interaction 
between all three variables: Sex of the 
Ascribed Author x Male versus Female 
Association of the Field x Grade Level of 
the Judges (high school and college). This 
three-way interaction seems to be due to the 
fact that the most favorable ratings were 
given by college judges to female authors 
writing in the female fields. The preference 
by college judges for male authors in the 
male fields was evident also, but not as 


TABLE 2 
ANALYSIS OF VARIANCE OF ARTICLE RATINGS AS 
A FUNCTION or GRADE LEVEL, SEX or AUTHOR, 
AND Sex or Fiero (Stupy 1) 


Source 


df | MS F 
Between subjects 55 
Grade level (A) 1 | 182.16 4.05* 
Hed within groups | 54 | 45.00 
Yos subjects 
uthor's sex (C) 1 1.78 | «1 
AXC 1| 36.17| 1.02 
CXB 54 | 35.31 
Sex of field (D) 1| 30.02 | <1 
AXD 1| 44.04 1.28 
DXB 54 | 34.79 
CXD 1|147.88 | 8.77** 
AXCXD 1| 77.:88| 4.61* 
CXDXB 54| 16.87 
*p < .05. 
** p « Ol. 
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strikingly as the preference for female 
authors in female fields (Table 1). More- 
over, the preference patterns were shown 
much more clearly in the judgments of col- 
lege students than of high school students, 
the latter showing only a general overall 
tendeney to favor male authors. While the 
data in Table 1 suggest that high school 
judges prefer male authors to female authors 
regardless of the sex association of the field, 
further analyses (below) show that such a 
generalization is unjustified. 

Separate analyses of variance were com- 
puted for each of the four fields (Table 4). 
Each analysis tested the effects of level of 
education of subject (high school or col- 
lege) and sex of ascribed author on the rat- 
ings given to the articles. Significant effects 


TABLE 3 
Mean RATINGS or ÅRTICLES ATTRIBUTED TO 
FEMALE AND MALE AUTHORS IN FEMALE 
AND Mate Fierros (Study 1) 


Field 
Author 
Female Male 
Female 21.84 24.20 
Male 23.29 22.39 


were obtained only for the two female fields: 
dietetics and primary education. High 
school students preferred the primary edu- 
cation article when attributed to a male 
author. College students showed no signifi- 
cant differences in their preferences for this 
article (see Table 5). 

To further isolate the sources of the 
overall effects, t tests were performed 10 
compare the evaluations by high school and 
college students on articles attributed to 
male versus female authors in each of the 
four fields. In the field of law, college stu- 
dents did not differentiate significantly ™ 
their evaluation of the articles, but high 
school students showed a marked preference 
for the article when it was attributed to ê 
male author (t = 248, p < .01). In city 
planning, this pattern was reversed. College 
subjects markedly preferred the city plat 
ning article when it was ascribed to a male 
author (t = 2.10, p < .05), while the prefet- 
ence of high school subjects was not signi" 
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TABLE 4 


Summary OF ANALYSES OF VARIANCE FOR Eacu FIELD As A FUNCTION OF JupGE's GRADE 
LEVEL AND Sex or AUTHOR (STUDY 1) 


Law City planning Dietetics Primary education 
Source df 
MS F MS F MS F MS F 
Grade level (A) 1 57.21 2.38 3.52 | <1 .28 | «1 481.81| 16.60** 
Author (B) 1 67.91 2.82 41.14 1.14 | 427.77 | 15.99**| 119.62 4.12* 
A x B r} 75.52 3.14 | 135.40 3.76 | 12.68 | <1 98.63) 3.40 
Error 53 24.08 36.01 26.76 29.00) 
«c .05. 
»»«.0. 


cantly different for male- versus female- 
authored articles in this field. Thus city 
planning was the one field in which high 
school judges showed no sex bias. 

In the field of dietetics, high school and 
college subjects both significantly preferred 
the article when it was attributed to a fe- 
male author (college students, t = 3.26, p < 
01; high school students, t = 2.38, p < 05). 
In primary education, the effect was as de- 
scribed above: high school students were 
biased in favor of the male author (¢ = 2.78, 
P < .05), while college students showed no 
significant bias. 


Discussion 


The obtained results indicate that while 
the sex of a professional does have substan- 
tial biasing effect on the evaluation of his or 
her achievements, the direction of these ef- 
fects are variable, complex, and specific. The 
present study thus does not extend Gold- 
berg’s (1967) finding of professional preju- 
dice against women to a sample of male and 
female judges, even when essays from iden- 
tical professional fields, as well as the same 
evaluative format, were used. The present 
data, in contrast to Goldberg's conclusions, 
show that while an author's sex detracted 
from the evaluation of his or her work in 
Some fields, it enhanced attributed status 
in others. Furthermore, these biases ap- 
peared to vary with the grade level of the 
judges, but were not affected by the judge’s 
own sex. Female authors were preferred by 
both high school and college students in the 
field of dietetics, but high school students 
preferred male authors in primary education 
and law, while college students preferred 
male authors in city planning. There were 


no significant differences in the preferences 
for either a male or a female author when 
high school students evaluated a profes- 
sional article on city planning or when 
college students evaluated articles in the 
fields of law and primary education. 

Bias tended to occur in the direction of 
the sex appropriateness of the field. There 
were five instances in which one or both of 
the groups (high school and college) in the 
US. sample showed significant sex bias. In 
most of these instances, the preferred author 
was the one whose sex matched the sex 
normatively associated with that field, that 
is, a female author in dietetics, male authors 
in law and city planning. The exception to 
this tendency was the preference of high 
school students for the primary education 
(female field) article when it was attributed 
to a male author. 

The sex of the judges had no effect on 
their evaluations of author competence in 
either male or female fields. This finding 
supported the hypothesis that there would 
be no differences between male and female 
judges evaluating professional work in 
male-associated fields. There was no support 
for the hypothesis that male judges might 
more strongly prefer women in fields tradi- 
tionally associated with women. 


TABLE 5 
Mean Ratines oF PRIMARY EDUCATION ARTICLE 
(Stupy 1) 
Author 
Educational level ae 
Female Male 
College 19.94 19.67 
28.43 22.87 


High school 
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Goldberg’s (1967) finding of bias against 
women even in traditionally feminine fields 
was replicated only in the rating of the 
article in primary education and then only 
for the high school judges. Both male and 
female high school students preferred the 
primary education article when it was at- 
tributed to a male author. In the other 
traditionally female field, dietetics, both 
high school and college judges preferred the 
article when it was attributed to a female 
author. This marked preference contradicts 
Goldberg’s result and is the only instance of 
sex bias on which both college and high 
school students agreed. In all other fields, 
only one group (either high school or col- 
lege) showed any sex bias at all, again 
highlighting the specificity of the obtained 
prejudices. 

Highly specific rather than broadly gen- 
eralized sex biases were also obtained by 
Pheterson and her colleagues (1969; Pheter- 
son et al., 1971). Pheterson (1969) failed to 
find bias in favor of males on evaluations of 
articles written in the fields of marriage, 
child discipline, and special education made 
by middle-aged uneducated women. In a 
later study, Pheterson et al. (1971) found 
bias in favor of males among women college 
students evaluating modern art that was 
entered in a competition but had not yet 
won an award. When the work had achieved 
an obvious success, men and women artists 
were rated equally. Pheterson et al. (1971) 
construe the results of these two studies as 
supporting a single interpretation. They 
propose that the subjects in the first study 

(Pheterson, 1969) perceived the very pub- 
lication of an article as conferring success 
upon its author. Thus, they see the results 
of both studies as illustrating the general 
principle of prejudice against women until 
they are proved successful. These data and 
the data of the present study indicate far 
more specificity in professional evaluation 
than Goldberg's earlier interpretation of 
consistent preference for male expertise. 

The differences in bias between the college 
and high school subjects in the present study 
are difficult to interpret. Nevertheless, it 
seems reasonable to speculate that the re- 
duction in bias from high school to college 
in the field of law might reflect the sub- 
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jects’ increased experience with female 
ability in this field. College students may be 
coming into closer contact with the increas- 
ing numbers of women in the field of law 
and thus may be willing to grant females 
competence in this field. The increase in sex 
bias on the part of college students in evalu- 
ating the city planning article might be due 
to their clearer understanding that this 
occupation involves the traditionally male- 
associated skills of engineering and mathe- 
matics. 

The high school students’ bias in favor of 
males in the field of primary education 
might reflect their own dissatisfaction with 
the large numbers of female teachers to 
which they (like most U.S. children) are ex- 
posed from kindergarten through high 
school. College students, with this period 
of elementary schooling well behind them, 
show no sex bias in evaluating the primary 
education article. Perhaps their failure to 
prefer a female author (as they did in the 
other feminine field, dietetics) indicates 
some residual dissatisfaction. 

Goldberg (1967) reported sex role stereo- 
types which he utilized in the design and 
interpretation of the results of his study. 
One possible explanation for the disparity 
between the results of the present study and 
those reported by Goldberg may be that 
judges in the present study did not share 
these occupational sex role stereotypes. 
Goldberg had based his classification on 
normative data obtained from a 6-point rat- 
ing scale. He asked his female college sub- 
jects the degree to which they associated 8 
list of occupational fields with men or wit 
women. The results showed that the fields of 
law and city planning were most strongly 
associated with men, primary education aD 
dietetics most strongly associated wit 
women. In view of the results of Study b 
it was decided to recheck the accuracy 9 
this classification to see whether it was ap- 
plicable to current coeducational college 
populations. 


Srupy 2 
Method 


College students (8 females and 13 males) from 
Stanford University were asked to indicate the 


mie 


-3 


———— A 
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degree to which each of 10 fields was associated 
with men or with women. Specifically, with regard 
to each occupational field, the subjects were asked 
to rate on a 6-point scale the degree to which the 
field was associated either with men (at one end 
of the continuum), or with women (at the other 
end). The scale ranged from —3 (most closely 
associated with males) to +3 (most closely associ- 
ated with females). The list included the 4 fields 
used in Study 1 plus 6 others: journalism, psy- 
chiatry, engineering, merchandising, art history, 
and linguistics. 


Results and Discussion 


An analysis of variance showed no over- 
all effect of sex of subjects, but there was à 
strong effect for field (F = 54.71, df = 
9/171, p < .01). The four fields used in 
Study 1 were rated in the expected direction: 
law and city planning were seen as strongly 
male associated, primary education and 
dietetics as strongly female associated. 

Among the four fields included in Study 1, 
the mean rating for law (—2.33) indicated 
that it was the field most strongly associated 
with males. Primary education (mean rat- 
ing, 1.90) was seen as the most strongly as- 
sociated with females of the faur fields. 
Nevertheless, a comparison of the rating 
for dietetics and city planning yielded a 
strong difference (F = 129.68, df = 1/171, 
p < .01), indicating that these two fields 
were significantly separate in their sex as- 
sociation in the expected direction. 

The results of the second investigation 
clearly indicated that law and city planning 
are strongly male-associated fields, dietetics 
and primary education are strongly female- 
associated fields for this college sample. 
This finding supports the use of articles 
drawn from these four fields in attempting 
to investigate possible differential evalua- 
tion resulting from sex role stereotypes. 

The correlation between sex role stereo- 
types and evaluative bias suggested by 
these studies raises the question of causa- 
tion. It seems quite possible that the evalu- 
ations are rational judgments reflecting the 
fact that, most of the time, there is a cor- 
relation between competence and sex ap- 
propriateness. For example, since girls and 
women are expected to be concerned with 
the purchasing, preparation, and serving of 
food and are channeled into the professional 
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roles of home economist and dietician;? it is 
likely that they will be more expert in the 
skills involved in the field of dietetics and 
will work better in it. Thus, much of what 
is referred to as “bias” in these studies may 
reflect reality judgments with the raters 
operating in the most realistic fashion pos- 
sible to maximize predictive accuracy, given 
the situation. As societal sex roles and sex 
role training change, presumably so will the 
expectations about sex differences in pro- 
fessional excellence in sex-typed fields. 


Srupy 3 


The specificity found so far alerts one to 
the need for a more fine-grained analysis of 
the phenomenon of sex bias. The results 
suggest that the relations between sex bias 
and occupational sex role stereotypes are far 
from simple. For example, college students 
showed no bias in their evaluation of articles 
in law and primary education, although 
they clearly had occupational sex role stere- 
otypes about these fields (Study 2), asso- 
ciating law with males and primary educa- 
tion with females. This suggests the possi- 
bility of stereotyping without consequent 
sex bias. This possibility was explored in 
Study 3 which examined sex bias and oc- 
cupational sex role stereotypes cross-cul- 
turally. In Study 3, the procedures de- 
scribed above for evaluating the degree to 
which various occupations were associated 
with either sex and the differential evalua- 
tion of professional competence in selected 
fields were extended to an Israeli sample. 

Israel, and particularly the kibbutz form 
of social structure, seemed a natural con- 
trast to the U.S. sample. Both in their gen- 
eral philosophy and in the specific mecha- 
nisms of their social structure (especially 
in the kibbutz), Israelis have consciously 
sought to eliminate sex prejudice for many 
decades. It was hypothesized that the kib- 
butz subjects would not show bias in evalu- 
ating the articles. It was further expected 
that they would not show strong associa- 
tions of various occupations with either 


3 These stereotypes are supported by many 
examples, one of which shows that males and 
females are shown in clearly different roles and 
occupations in a sample of primary reading text- 
books (Jacklin & Mischel, 1973). 
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sex. À sample of Israelis living in Jerusalem 
was included so that their responses could 
be compared to those of kibbutz residents. 
It was expected that Israelis living in a 
city would show stronger sex association of 
partieular oceupations than Israelis living 
on a kibbutz but would not be sex biased in 
evaluating articles. It was expected that 
Tsraelis living in Jerusalem would share the 
sex role stereotypes of U.S. college students 
as a result of the similarity of the socio- 
economic conditions in which they lived 
but that they would exhibit less sex bias as 
a consequence of their national ideology of 
sexual equality. Kibbutz residences, experi- 
encing a social structure allowing more oc- 
cupational realization of sexual equality, 
were expected to show less sex role stereo- 
typing than Jerusalem residents as well as 
less sex bias than the U.S. groups studied. 


Method 


Subjects. Two groups of subjects participated 
in the study. One group consisted of high school 
youths and adults from the Kibbutz Barkai of the 
Artzi federation. For the evaluation of professional 
articles, the kibbutz sample was to have included 
every high school youth on Kibbutz Barkai. How- 
ever, only about two thirds returned the forms. 
"Thus, 28 subjects (14 females and 14 males between 
the ages of 13 and 20 years) were included in the 
article evaluation portion of the study. In addi- 
lion, a random sample of 50 kibbutz adults (28 
females and 21 males) between the ages of 25 and 
48 years filled out the occupation association scale. 

The second group consisted of 25 students from 
a high school in Jerusalem. This group of 13 
females and 12 males completed both the article 
evaluation questionnaire and the occupation asso- 
ciation rating scale. 

Procedure. The article evaluation questionnaire 
and the Occupation association scale used in the 
studies conducted in the U.S. were translated into 
Hebrew to ensure thorough understanding on the 
part. of the Israeli subjects. All subjects participat- 
ing in the article evaluation were given the same 
instructions and articles (translated into Hebrew) 
as the U.S. subjects had received. The fictitious 
authors’ first or "Christian" names were also 
translated. into comparable Hebrew names (eg. 
Devora Simpson, David Simpson) so that there 
would be no question in the judges’ minds about 
the sex of the author. The adult kibbutz sample 


‘Although every effort was made to give the 
13-20 years olds both the occupation association 
scale and the article evaluations, time constraints 
on the members of the kibbutz made it impossible. 
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filled out the occupation association scale in the 
same way as the U.S. sample had. The occupations 
listed were the same as those used with the US, 
sample, with the addition of *psychologists." 

The Jerusalem high school students made their 
ratings of the sex association of various occupations 
after completing the essay evaluation portion of 
the experiment. One student did not fill out. the 
rating sheet correctly, therefore occupation-rating 
data are reported for only 24 Jerusalem students. 


Results and Discussion 


Sex bias. A 2 x 2 analysis of variance was 
computed for repeated measures to test the 
effect of sex of judges (male and female), 
sex of author (male and female), and fields 
(masculine and feminine) on the ratings of 
the articles by the kibbutz sample. The re- 
sults showed no significant main effects nor 
interactions of the three variables. A similar 
statistical analysis was computed for the 
article evaluations of the Jerusalem sample 
with similar results: no significant main ef- 
fects nor interactions of sex of judge, sex 
of authors, and fields. Thus, the ratings of 
the quality of the articles by both the kib- 
butz and the city (Jerusalem) sample 
showed no effect of field or author distine- 
tions. 

Occupational sex role stereotypes. An 
analysis of the degree to which the Israeli 
sample associated particular occupational 
fields with either males or females indicated 
that different fields were perceived as dif- 
ferently sex associated by both the kibbutz 
and the city samples. 

For the kibbutz sample, a one-way analy- 
sis of variance with repeated measures was 
performed on the ratings for each field. This 
indicated that the different fields were per- 
ceived as differentially sex associated (F = 
142.19, df = 10/610, p < .001). Contrasts 
were then computed to determine whether 
the four fields from which articles were 
evaluated were clearly sex associated fot 
the kibbutz sample. The first contrast com 
pared law and city planning combined with 
primary education and dietetics combined: 
It showed that these fields are indeed se* 
associated in the expected way (F = 
1,001.68, df = 1/610, p < .001), law and 
city planning being strongly male associat 
and primary education and dietetics 


strongly female associated. Additional com” | 
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trasts showed that primary education and 
dieteties are perceived as equally female 
fields (F <1) and that law and city 
planning are perceived as equally male 
fields (F = 1.72, ns). 

For the Jerusalem sample, a one-way 
analysis of variance with repeated measures 
was performed on the ratings for each field. 
This indicated that different fields are per- 
ceived as differentially sex-associated (F = 
41.93, df = 1/220, p < .001). Next, contrasts 
were computed to determine whether the 
Jerusalem sample’s ratings of law, city 
planning, primary education, and dietetics 
were sex associated in the expected way. 
The first contrast compared law and city 
planning combined against primary educa- 
tion and dietetics combined. It showed that 
the first two fields were clearly more male 
associated, whereas the second two fields 
were clearly more female associated (F = 
249.07, df = 1/220, p < .001). An additional 
contrast showed that dietetics was perceived 
by the Jerusalem group as a more feminine 
field than primary education (F = 5.04, 
df = 1/220, p < .05). Primary education 
ranked second to dietetics in degree of fe- 
male association for the 11 fields rated. Law 
and city planning were perceived as equally 
masculine (F < 1). 


CONCLUSIONS 


The results of the present three studies 
contradict the generalized preference for 
male expertise reported earlier by Gold- 
berg (1967) and underline the subtle dis- 
eriminativeness and temporal instability of 
Sex bias. Goldberg had hypothesized that 
While women would prefer an article written 


165 


by a male author in a male field, they would 
prefer a female author in a female field. His 
results indieated that women preferred ar- 
ticles attributed to male authors even in 
female fields and thus seemed to demon-, 
strate a pervasive, nonspecific preference 
for male expertise. In contrast, in the pres- 
ent studies, sex bias (i.e., a differential eval- 
uation of work on the basis of sex) appeared 
to be a function both of the particular field 
in which work was judged and of the educa- 
tional level of the subject doing the evalua- 
tion. The patterns of sex bias were similar 
regardless of the sex of the judge. The ob- 
tained interaction between the particular 
field in which an author was being evalu- 
ated, the educational (or grade) level of the 
judge doing the evaluation, and the sex of 
the author indicate that sex bias was highly 
specific to certain fields and changed as 
educational level changed. 

An Israeli sample (both a kibbutz and a 
Jerusalem group) showed no sex bias in 
evaluating professional journal articles. The 
finding that the Israeli subjects showed less 
sex prejudice than did the U.S. subjects 
might, of course, be dismissed as merely due 
to a failure to select occupations for which 
the sex role stereotypes in Israel are similar 
to those in the United States, Such an inter- 
pretation is untenable, however, because 
the results showed that the Israeli subjects 
had the same occupational sex role stereo- 
types as the U.S. subjects regarding the sex 
association of the four critical fields. Never- 
theless, these shared stereotypes did not 
affect the judgments of the Israeli subjects. 
Unlike the U.S. subjects, they did not dis- 
tinguish in their evaluations of the articles 


TABLE 6 


JUDGES’ PREFERENCES FOR ARTICLES ATTRIBUTED TO Mare (M) versus FEM. 


ALE (F) 


AvurHoRs IN Eac FIELD 


Female field Male field 
ar Dietetics [Primary education| City planning Law 
* 
US. hi F>M* M > F* M=F M>F 
US. Mr F> M” M=F M > F* M=F 
M=F M=F M=F M=F 


Israel (kibbutz and Jerusalem) 


w P< 05. 
Pp < 01. 
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on the basis of the sex of the attributed 
author. A summary of the preference pat- 
terns found in both U.S. and Israeli sub- 
jects is shown in Table 6. 

The shared sex occupation stereotypes 
found in the samples from the two cultures 
is congruent with several reports by Ameri- 
cans (e.g., Bettelheim, 1969; Spiro, 1958) 
and by Israelis themselves (R. R. Eiferman, 
personal communication, 1971) that even 
in Israel, men and women are thought to 
be more “naturally” inclined to certain dif- 
ferent occupations. In fact, it is just this 
sex association of various occupations that 
seems to have led many of these observers 
to express some pessimism regarding the 
possibility of completely eliminating sex 
bias and inequality in Israel. 

The present finding that sex occupation 
stereotypes can exist side by side with 
absence of sex bias, gives some cause for 
optimism. While there was no direct mea- 
sure of the sex role stereotypes, if any, sub- 
scribed to by the group of kibbutz youth, it 
is instructive that their immediate elders (in 
many cases their parents) and their age 
counterparts in Jerusalem associate law and 
city planning, dietetics and primary educa- 
tion differentially with males and females 
in the same way as a U.S. college sample. 
There is growing evidence that beliefs, at- 
titudes, and behavior are not necessarily 
consistent (e.g., Aronson, 1972; Bem, 1970). 
As the present data show, beliefs about oc- 
cupational sex stereotypes need not produce 
sex bias in the judgment of competence in a 
particular field. 

Although it is difficult to generalize from 
such small samples to the different cultures 
from which they are drawn, it seems plausi- 
ble that in a culture in which women do 
have more professional opportunities and 
are seen as more equal to men in their abil- 
ities, there would be little evaluative sex 
bias. In Israel, the availability of, and 
experience with, competent women in a 
greater variety of fields (e.g. in the army 
and as head of the government) may break 
down biases in practice even though sex role 
stereotypes, remarkably similar to those 
found in the United States, continue to 
exist at the level of attitudes. Finally, it 
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may be speculated that in studies such as 
the present three and those of Goldberg 
(1967), Pheterson (1969), and Pheterson et 


al. (1971), the phenomenon called sex “bias” | 


may partly reflect reality judgments about 
the differential probability of success for 
males and females within particular fields, 
These sex differences in the objective prob- 
ability for success within particular fields 
may be mirrored by the stereotypes and 
“biases” shared by members of the culture. 
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AN EXTENDED APPLICATION OF CONTINGENCY MANAGEMENT 
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Sixteen black and white inner-city public school teachers who were 
trained to use positive behavior contingencies did so for one academic 
year with a total of 730 ‘Afro-American disadvantaged pupils from the 
first through the eighth grades. Compared with matched control teach- 
ers and classes, these 16 teachers showed higher incidences of positive 
reinforcement and lower incidences of punishment. The experimental 
classes were less disruptive and more on task. They gained more in 
both IQ and school achievement. Inner-city teachers can be trained to 
employ positive techniques of behavior management; they like and 
use such training; and public school pupils profit dramatically from re- 
structuring their learning environment. 


Both black and white children from low 
socioeconomic backgrounds are failing to 
gain an adequate education in the nation’s 
central-city schools (e.g., Coleman et al., 
1966; Dittman, 1967; Kvaraceus, 1965, 
McCandless, 1967, 1970). In fact, the educa- 
tional achievements of these children have 
been repeatedly documented as dismal. As 
a group, they fall further and further 
behind their economically advantaged, sub- 
urban peers with each year of schooling 
(e.g., McCandless, 1970). 


1 This study was conducted under the auspices of 
the Georgia Department of Education, which pro- 
vided funds allocated under Title III, Public Law 
89-10. Readers will note that incredible amounts 
of effort and goodwill on the parts of many were 
necessary for this study to be conducted. The 
Project staff, the teachers involved in the class- 
rooms, and the principals of the different schools 
deserve at least as much credit as the authors. 
Thanks should also be extended to all the chil- 
dren and youth who took part. 

? Requests for reprints should be sent to Howard 
A. Rollins, Psychology Department, Emory Uni- 
versity, Atlanta, Georgia 30322, or to Marion 
"Thompson, Director, Project Success Environment, 
210 Pryor Street, Atlanta, Georgia 30303. 

Now at Western Carolina Center, 
town, North Carolina. 


Morgan- 


The list of variables offered as explana- 
tions for the academie plight of the inner- 
city child appears endless (c.f., Bronfen- 
brenner, 1974). However, as Becker, 
Engelmann, and Thomas (1971), among 
others, suggest, etiology may be less im- 
portant than the academie environment in 
which these children are placed. Our schools 
are designed to build successively year after 
year upon skills acquired by the children in 
previous years. Tf at any point a child has 
not acquired the appropriate prerequisite 
skills, failure is likely. For inner-city chil- 
dren, such failures often occur early, since 
they typically enter school poorly prepared 
to handle both the standard public school 
curriculum and the middle-class format of 
the classroom. Further, a history of failure 
may promote expectations of failure which 
in turn make actual failure more likely. 
Thus, inner-city children are forever behind, 
confused, and as a consequence, probably 
lose all interest in undertaking new aca- 
demic tasks. As a result, inner-city class- 
rooms are filled with unhappy, restless 
children who are relatively uninvolved in 
academic work and often are highly disrup- 


tive. 
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If this analysis is correct, then one logical 
course of action is to replace failure with 
success. To guarantee inner-city children 
success, the school curriculum must be 
modified to provide opportunities for success 
by matching the material presented to the 
level at which these children function, and 
teachers must be trained to emphasize 
success while minimizing failure. 

Contingency management is one vehicle 
available for providing inner-city students 
with success experiences. Based on the 
principles of operant conditioning, contin- 
gency management has proved to be an 
effective motivational system (e.g., Staats, 
1964; Staats & Staats, 1963). Both social 
and token reinforcements haye been used in 
classroom settings to accelerate appropriate 
behaviors, such as attending behavior, and 
to decrease disruptive behaviors. Praise and 
teacher attention have provided adequate 
incentives for many students to perform 
effectively (e.g., Becker, Madsen, Arnold, 
& Thomas, 1967; Harris, Wolf, & Baer, 
1964; Madsen, Becker, & Thomas, 1968; 
Zimmerman & Zimmerman, 1962). Token 
reinforcements—tangible objects or symbols 
which, when exchanged for a variety of 
other objects such as edibles or playthings, 

acquire reinforeing power themselves—have 
also proved effective in modifying pupil 
behavior (e.g., Kupers, Becker, & O'Leary, 
1968; McLaughlin &  Malaby, 1971; 
O'Leary, Becker, Evans, & Saudargas, 
1969). 

Other investigators have shown that aca- 
demic achievement can be accelerated 
through manipulating contingencies. In a 
series of studies, Staats (1964; Staats, 
Minke, Finley, Wolf, & Brooks, 1964; 
Staats & Staats, 1963) reported significant 
gains in reading achievement using token 
reinforcement. Staats and his colleagues 
(e.g., Staats & Staats, 1963), among other 
relevant theoretical projections, suggest 
“work” rooms in schools where token rein- 
forcers are earned for learning activities 
and other rooms and situations where the 
child receives the reinforcement, that backs 
up the tokens. This was a key strategy em- 
ployed in the present study. Among others, 
Clark and Walberg (1968) and Wolf, Giles, 
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and Hall (1968) have reported similar re. 
sults. Teachers have also been trained to 
manipulate contingencies in the classroom, 
and the results have been encouraging (e.g, 
Hall, Lund, & Jackson, 1968; Hamblin, 
Buckholdt, Ferritor, Kozloff, & Blackwell, 
1971; Thomas, Becker, & Armstrong, 1968). 
The following points should be made | 
about the present study. First, it has been } 
clearly demonstrated, and is supported by 
our own observations made early in the | 
course of conducting this study, that 
teachers of inner-city pupils typically em- 
ploy negative and even punitive methods as 
their major incentive technique for behavior 
control and academic learning. Second, it is 
clear that behavior modification is not 
necessarily a positive technique, but can be 1 
and often is accomplished by means of aver- | 
sive incentives. Third, most of those report- | 
ing in the literature, either by themselves or | 
by way of specialists trained by them, have — 
been interested in finding whether the be- 
havior modification technique worked but | 
have been less interested in training class- 
room teachers in its use. | 
| 


In the present study, we have worked to 
move teachers from the employ ment of a pre- 
ponderance of negative to a preponderance 
of positive incentives. Appropriate behavior | 
is rewarded, inappropriate behavior 1$ 
ignored, and almost no aversive incentives 
are used. Also, our emphasis has been on the 
pre-service and in-service training of teach- 
ers rather than on that of specialists in the 
use of positive behavior modification. Fur- 
ther, most behavior modification investi- 
gators who have reported in the literature 
have worked with individual students 0l 
small groups for limited periods of time 
such as six weeks. In the present study, 
contingency management technique was 1" - 
plemented in a large number of inner-city 
classrooms from the first to the eight 
grades for an entire academic year. M. 

Entitled “Project Success Environment, 
the pilot 1970-1971 study included eight & - 
perimental classes with appropriate com- 
parison classes at the first-, second-, third-, 
and seventh-grade levels (c.f. Thompson, | 
Brassell, Persons, Tucker, McCandless, © | 
Rollins, 1973). Following this initial devel- 
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opmental effort and its encouraging results, 
the program was expanded to inelude twice 
the number of students within a wider age 
range during the second year of operation, 
1971-1972. A reasonably rigorous experi- 
mental design was also incorporated in the 
second year. 

As a group study, Project Success En- 
vironment was not designed as an exercise in 
scientific analyses of behavior. It purpose, 
rather, was to answer an actuarial question 
of the sort suggested by Baer (1971): Can 
behavior modification solve the referring 
social problem, which has been analyzed 
into two sets of behavior—those too high 
and those too low in rate? The central ques- 
tion in this study is whether teachers can be 
trained to use the techniques made available 
through behavioral analysis to provide large 
numbers of students from economieally 
disadvantaged backgrounds with some mod- 
icum of individual success. 


METHOD 


Subjects and Setting 


Three hundred and sixty-seven male and 363 fe- 
male black pupils enrolled in a public middle 
school and three of its feeder elementary schools 
participated in the study. The four schools are 
located on the fringe of inner-city Atlanta and are 
characterized by substandard educational achieve- 
ment and a high proportion of pupils from low- 
income families; for example, 548% of the pupils 
in one of the elementary schools are from families 
with annual incomes of $2,000 or less. 

The subject population during the second year 
of the study (the present data) consisted of pupils 
in the first, second, third, fourth, sixth, and eighth 
grades. The total sample was divided into an 
experimental group and a control group of 16 and 
14 classes, respectively, with from 22 to 25 pupils 
per experimental class and from 25 to 28 pupils 
per control class. The control classes at the ele- 
mentary level were in a nearby elementary school, 
while those at the middle-school level (sixth and 
eighth grades) were in the same school. Of the 355 
experimental subjects, 154 were exposed to the 
treatment over a period of two consecutive years. 
The rest were involved in the study for the second 
year alone, so that the percentage of “two-year” 
pupils in the experimental classes during the sec- 
ond year ranged from 0% in seven classes to 81% 
in two classes. Thirteen experimental and 10 con- 
trol teachers were black, the others were white. 

Although control subjects were identified during 
both years of the study, those for whom data are 
reported here were selected just before the begin- 


169 


ning of the second academic year. With the ex- 
ception of the first-grade pupils, the experimental 
and control subjects were matched on the basis of 
reading scores obtained the previous April on the 
Metropolitan Achievement Tests. 


Procedures 


Behavior modifiers. The teachers of the 16 ex- 
perimental classes served on a voluntary basis as 
behavior modifiers within the framework of the 
publie school setting. Eight of the teachers par- 
ticipated in the study from its inception, while 
the remaining 8 participated only during the 
second year. The 14 control teachers were selected 
at the beginning of the second year by their re- 
spective principals from the available faculty at the 
appropriate grade levels. Most of the experimental 
and control teachers were female, with previous 
classroom experience of from 1 to 13 years. A para- 
professional aide was available to each experi- 
mental and control teacher for approximately 90 
minutes per day to assist with clerical and logistical 
tasks. 

Teacher training. Most of the training for the ex- 
perimental teachers was accomplished in a three- 
week workshop during the summer preceding the 
second academic year. (The eight experienced 
teachers had also participated in a similar three- 
week seminar in the first year.) The workshop was 
conducted by three psychologists and three edu- 
cators and was designed to provide instruction in 
the theory and practical application of operant 
conditioning and to involve the teachers in plan- 
ning for the classroom implementation of be- 
havioral management procedures and various cur- 
ricular activities. 

During the mornings, the teachers participated 
in discussion sessions that were primarily focused 
on readings in behavior modification from Teach- 
ing: A Course in Applied Psychology (Becker, et 
al., 1971) and other sources. The teachers then had 
the opportunity to apply behavioral management 
principles in classroom settings while being ob- 
served by their peers, and a videotape recording 
was made of them, which was to serve as the basis 
for further discussion in classroom management. 
The teachers were also exposed to systematic class- 
room observation by collecting data in actual 
classrooms using the procedures and forms that 
trained observers would use later in their class- 
rooms. In addition, each teacher shared in the 
identification of the pupil behaviors to be modified 
during the following year and in the establishment 
of a token economy to support the behavior modi- 
fication effort. 

The afternoons during the workshop were de- 
voted to curriculum planning, especially for the 
initial weeks of school. Emphasis was placed on 
formulating behavioral objectives for pupils, de- 
veloping and using individualized instruction tech- 
niques, using programmed reading materials and 
academic diagnostic instruments, and establishing 
and maintaining a specific classroom arrangement. 
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Throughout the year, an experienced behavioral 
management technician was available at least twice 
weekly to assist each teacher with current, problems 
in classroom management. Individual in-service 
sessions concerning curriculum implementation 
were also conducted weekly by two curriculum co- 
ordinators, one of whom concentrated on the ele- 
mentary curriculum and the other on the middle- 
school curriculum. 

Target behaviors. For the first six to eight weeks 
of school, the emphasis was on the positive rein- 
forcement of desirable classroom conduct in an 
effort to increase the frequency of on-task behavior 
and to decrease the frequency of disruptive be- 
havior. On-task behavior was defined as apparent 
attention to assigned academic tasks, while dis- 
ruption subsumed any unsolicited behavior serv- 
ing to distract pupils from academic tasks, for 
example, physical contact or inappropriate social 
conversation among pupils. Appropriate classroom 
behavior in the elementary classes was stipulated 
by the following set of conduct rules agreed upon 
by the elementary teachers during the summer 
workshop: (a) stay in your seat, (b) work hard, 
(c) pay attention, and (d) raise your hand to 
speak. The rules agreed upon by the middle-school 

teachers were as follows. (a) pay attention, (b) 
have necessary tools for work, (c) stay on task, 
and (d) raise your hand for recognition. 

Although these behavioral guidelines were com- 
mon across classes, each teacher was encouraged 
to interpret them according to her individual 
teaching style and to relate her precise interpreta- 
tion of the guidelines to her pupils on the first day 
of school, Some teachers chose to specify that 
their pupils remain in their seats except when 
granted permission to move; others indicated to 
their pupils that they could circulate freely within 
the classroom, provided they were engaged in an 
academic task. In any event, the teachers were con- 
sistent within their respective classrooms in the 
specification and execution of behavioral con- 
tingencies, Once desired conduct was established 
within a class, the emphasis shifted partially to the 
reinforcement of academic behavior. For most 
classes, this shift to the reinforcement of academic 
behavior occurred no later than the third week of 
school. 

Behavior management. The teachers adhered to a 
basic premise of “ignore and praise; " that is, they 
attempted to attend and reward suitable behavior 
while disregarding improper conduct. Behavioral 
management, then, was implemented overtly by 
means of positive reinforcement, although the 
possibility exists that the teachers may have also 
used subtle forms of punishment, such as the 
withdrawal of attention. In addition to praise 
and other forms of social reinforcement, the 
teachers relied heavily on a token system in which 
checkmarks on “reward cards” and tickets were 
dispensed in the elementary and middle-school 
classes, respectively. Since the elementary classes 
were self-contained, the elementary pupils were 
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exposed to the behavioral contingencies through- 
out each school day. The middle-school classes, 
however, were taught by teams (three teachers 
per team in the sixth grade and four in the eighth 


grade), so that the pupils were exposed to the con-. 


tingencies for approximately four hours daily dur- 
ing the mornings while they attended the basic 
classes (reading, mathematics, social studies and, 
in the eighth grade, science) taught by the ex- 
perimental teachers. During the afternoons, the 
middle-school pupils attended nonexperimental 
exploratory classes, such as music, art, and home 
economics. 

Throughout the first day of school and for 
several days thereafter, immediate primary rein- 
forcement (M & Ms and hard candy) was paired 
with praise and token reinforcement contingent 
upon approximations of desired social conduct, in- 
cluding such behaviors as simply coming to school 
and sitting at a desk. Enough. tokens were dis- 
tributed within the first two days for every pupil 
to exchange them for a variety of backup rein- 
forcers, including both inexpensive "fun" items 
and school supplies. During the initial two weeks 
of school, reinforcement was dispensed on a gen- 
erally continuous and predictable basis, but as 
the desired behaviors were gradually shaped, the 
tokens were dispensed on more intermittent, less 
predictable schedules. In order to generate more 
intermittent schedules and to provide a mechanism 
for delay of reinforcement, the number of check- 
marks required to complete a reward card for the 
elementary pupils increased progressively from 
25 to 150. In the middle-school classes, not only 
were less expensive items (in terms of tokens) Te 
placed by more expensive items, but the prices 
of various items were increased so that, for in- 
stance, an item which could be obtained for 5 
tickets on the first day of school was worth 26 
tickets by the end of the second week. , 

During the third and fourth months, all tangible 
rewards, with the exception of certain school 
supplies, were replaced by activity reinforcers M 
that the pupils traded tokens for access during 
school hours to the activity room supervised by 
the paraprofessional aides. The activity rooms at 
the elementary schools were stocked with suc 
items as games, toy cars, comic books, sewing kits, 
dolls, Tinker Toys, and Lincoln Logs, while the 
middle-school activity room contained not only 
games, magazines, and puzzles, but also recor 
for listening and dancing, as suggested by a stu- 
dent committee. The elementary pupils were also 
able to exchange tokens for the privilege of assist- 
ing the teacher in such capacities as playgrount 
monitor, chalkboard monitor, and “mini-teachet 
or tutor. i 

Although there had been some concern thal 
the pupils would find the transition from tangible 
to intangible rewards unpalatable, there were no 
apparent, detrimental effects, perhaps because ther 
was a four- to six-week overlap between the b 
types of reinforcement and because the pup 


P 


PROJECT SUCCESS ENVIRONMENT 


were told of the impending change six weeks in 
advance. Immediately prior to the final conversion 
to activity reinforcers, auctions were held in each 
classroom to dispose of the remaining tangible 
rewards. From this point through the remainder 
of the year, tokens could be exchanged only for 
activity reinforcers and a limited assortment of 
school supplies, such as pencils and notebooks. 

Classroom arrangement. À classroom arrange- 
ment, consisting of a mastery center for instruc- 
tion and five academically oriented interest sta- 
tions, served to structure the instructional pro- 
gram and concomitantly to free the teachers for 
more interaction with individual pupils and small 
groups. Within the mastery center, the pupils were 
divided into three ability groups in which they re- 
ceived instruction and completed academic assign- 
ments. While one group received instruction and 
the second completed assigned tasks, the third 
group visited the various interest stations that 
were designed to foster individual and small group 
exploratory behavior without direct, teacher inter- 
vention. The five stations included a library sta- 
tion with books, magazines, and newspapers; an 
art station with a variety of paints, crayons, and 
other materials; a communications station with a 
language master, phonograph, and tape recorder ; 
and an exploratory station with an assortment of 
science materials keyed to the instructional pro- 
gram; and a games and puzzles station equipped 
primarily with academically related materials. 
The materials at the stations were changed or 
rotated among the classrooms at least weekly by 
the paraprofessional aides. 

Control classes. All control classes were con- 
ducted in a traditional manner with a single 
teacher managing each class in a lecture format. 
Control teachers had access to numerous academic 
materials but seldom used materials other than 
those prescribed by the school system. None of the 
control teachers received any formal training in the 
use of contingency management procedures. 


Behavioral Observations 


Five paraprofessional data gatherers system- 
atically observed teacher and pupil behavior in 
each class for 45 minutes daily during the first three 
weeks of school and twice weekly during the re- 
mainder of the year. Each observer gathered data 
in both the experimental and the control classes 
during observation sessions, which varied from 


‘morning to afternoon and occurred only during 


periods of academic activity. 

During each 45-minute observation session, the 
relevant behaviors were observed three times in 
structured 15-minute sequences in order to obtain 
more typical behavioral samples. Within a single 
15-minute sequence, data were obtained using 
three different procedures to observe and record 
teacher reinforcement and punishment, pupil dis- 
ruption, and pupil attention, in that order. 

Positive reinforcement and punishment. The 
teacher alone was observed for 5 minutes in each 
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sequence (for a total of 15 minutes during the 
entire 45-minute observational session), The data 
gatherer counted and classified every instance of 
teacher-administered positive reinforcement ac- 
cording to the nature of the reinforcer (tangible 
or nontangible) and according to the nature of the 
behavior being reinforced (academic or conduct). 
The average number of positive reinforcements 
administered per student in a 15-minute period 
constituted a criterion measure, which was ob- 
tained by dividing the total number of reinforce- 
ments administered by the number of pupils pres- 
ent during the observation session. A second 
criterion measure consisted of the total number 
of instances of punishment. 

Teacher behaviors recorded as positive rein- 
forcement included verbal praise, positive physical 
contact, the granting of privileges, and the admin- 
istration of tangible rewards such as candy or 
tokens (which were administered only in the ex- 
perimental classes). Punishment included criticism 
implied explicitly or implicitly through threats of 
consequences, voice tone, or facial expression; 
aversive physical contact with pupils; withdrawal 
of pupil privileges; and isolation of pupils. 

Disruption. During the second 5-minute period 
jn each 15-minute sequence, the data gatherer 
ignored the teacher while continuously scanning 
the entire class for instances of disruptive pupil 
behavior. In general, disruption encompassed any 
unsolicited pupil behavior serving to distract other 
pupils from academic tasks, such as talking or being 
out of one’s seat without permission ; generating 
loud noises; or disturbing other pupils either 
verbally, by means of physical contact, or by 
handling another pupil's possessions. A single 
pupil could not be observed for disruption more 
often than once every 10 seconds. The criterion 
measure was the average number of disruptions 
per pupil per 15 minutes, obtained by dividing the 
total number of disruptions recorded by the num- 
ber of pupils present during the observation session. 

‘Attention. Attentive behavior was observed 
during the final 5-minute period of the 15-minute 
sequence. One third of the pupils assigned an aca- 
demic task were observed during each of the three 
5-minute periods, with the focus being on at- 
tention; therefore, each pupil was observed for 
attentive behavior one time only for 20 seconds 
within the entire 45-minute observation session. 
The data gatherer recorded the number of sec- 
onds during which the pupil was off task; that is, 
during each 20-second interval, the behavior of 
one pupil was observed and the amount of time 
apparently devoted to other than academic tasks 
was recorded. Each pupil observed was classified 
as involved (0-5 seconds off task), medium in- 
volved (6-15 seconds off task), or uninvolved (16- 
20 seconds off task). The criterion measure was 
the percentage of time on task for the entire class, 
calculated by adding the number of pupils classi- 
fied as involved to one half of the number classi- 
fied as medium involved, then dividing the sum 
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by the total number of pupils observed, and 
multiplying the quotient by 100. 

Interrater reliability. Reliability coefficients were 
obtained periodically for the five data gatherers 
by comparing their observations with the simul- 
taneous observations of one of the behavioral 
management technicians. Most of the resulting 
coefficients were above .80. The median coefficients 
(over 12 reliability checks) for reinforcement, 
punishment, disruption, and attention were .94, 
78, 90, and .88, respectively. 

Academic aptitude and achievement. One of the 
primary goals of the program was to accelerate 
both academic aptitude and achievement. In order 
to assess these objectives, the short forms of the 
California Test of Mental Maturity and the 
California Achievement Test were administered to 
all experimental and control subjects in mid- 
September and again in early May. The latter 
test was administered at reading level rather than 
grade level, since testing during the first pilot 
year of the study had indicated that these chil- 
dren frequently performed at chance when tested 
at grade level. 


ResuLTS 


Behavioral Observation 


The teacher training program, among 
other things, was designed to increase the 
frequency of teacher-administered rein- 
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forcements and to reduce the frequency of 
teacher-administered punishments. The ef- 
fects of the training program on teacher re- 
inforcements are given in Figure 1 relative 
to controls across the entire school year, 
Each data point in Figure 1 is an average of 
the observations within a one- or two-week 
period. It is evident in Figure 1 that experi- 
mental teachers administered more rein- 
forcements per student than control teach- 
ers. In fact, the experimental teachers 
essentially doubled the rate of reinforcement 
administered by control teachers. The num- 
ber of reinforcements delivered by project 
teachers dropped over the course of the 
school year, reflecting progress toward an 
intermittent schedule. However, even at the 
end of the year, project teachers reinforced 
their students at more than twice the rate of 
control teachers. 

A 2 x 2 x 18 (Frequency of Reinforce- 
ment Delivery with Treatment Group X 
Elementary versus Middle-School Grade 
Level X Observation Interval) analysis of 
variance was performed. Consistent with 
the above observations, experimental teach- 
ers administered reliably more reinforce- 
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FrcunE 1. Mean number of reinforcements per student over weeks as a function 
of treatment condition and grade level. 
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ments than control teachers (F — 36.98, df — 
1/26, p « .01). In addition, the Treatment 
Group X Observation Interval interaction 
was reliable (F = 4.63, df = 17/442, p < 
01) , reflecting the reduced separation of ex- 
perimental and control teachers across the 
school year. There were no reliable effects 
of grade level. 

Figure 2 is a representation of the average 
number of punishments per 15-minute in- 
terval delivered by project and control 
teachers by grade level. As shown in Figure 
2, project teachers were much less punitive 
than control teachers, particularly during 
the first few weeks of school. For the first two 
weeks, project teachers at both elementary 
and middle schools administered fewer than 
one fourth as many punishments as control 
teachers. The rate of punishment declined 
over the school year for all groups. At the 
elementary level, project and control teach- 
ers punished at about the same level during 
the last two weeks, while controls at the 
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middle-school level continued to administer 
more punishment throughout most of the 
year. In line with these observations, an 
analysis of variance yielded significant ef- 
fects for treatment (F — 3.35, df — 17/442, 
p « .01). In addition, grade level and weeks 
interacted (F = 4.20, df = 17/442, p < .01), 
indicating the greater decline in punishment 
rate at the elementary level than at the 
middle-school level. 

These data demonstrate quite clearly that 
the project teachers’ behaviors were ap- 
propriately modified by the summer and in- 
service training. As a result of these changes, 
students in project classes received a rather 
massive dose of contingent “success” and 
minimal punishment. The alterations of 
classroom environment produced some 
marked changes in pupil behavior as evi- 
denced by the data for disruptions and task 
involvement presented in Figures 3 and 4. 
As indicated in Figure 3, students in project 
classes emitted fewer disruptions than chil- 
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Ficure 2. Mean number of punishments delivered per 15 minutes as a function 
of treatment condition and grade level. 
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Ficure 3. Mean number of disruptions per 15 minutes over weeks for each 
treatment by grade level group. 


dren in control classes (F — 33.99, df — 1/26, 
p < 01). At both elementary and middle- 
School levels, disruption in project classes 
was approximately one half to one third 
that of control classes. The Trials x Treat- 
ment interaction was also reliable (F = 
2.30, df = 17/442, p « .01). The level of 
disruption in project classes declined 
slightly over the first few weeks and then 
remained relatively stable for the remainder 
of the year. In control classes, on the other 
hand, there was a gradual drop at mid-year 
and then a rise in disruptive level. This was 
particularly noticeable in the middle-school 
control classes. 

Project teachers were instructed to ignore 
disruptive behaviors in order to reduce their 
frequency and to reinforce any behaviors 
that evidenced work on assigned academic 
material. The effect of this procedure on 
task involvement is apparent in Figure 4. 
Project students increased from about 75% 
task involved near the beginning of the year 
to more than 90% task involved by mid- 
October. Control students, on the other 
hand, remained from 65% to 75% involved 


throughout the school year. A 2 X 2 X 18 
(Frequency of Reinforcement Delivery with | 
Treatment Group X Elementary versus 
Middle-School Grade Level x Observation 
Interval) analysis of variance indicated a 
reliable effect of treatment (F — 60.89, df — 
1/26, p < 01) favoring project classes. In 
addition, the difference between project and 
control classes increased across observation 
intervals (F = 3.42, df = 17/442, p < 01). 
Thus, in terms of in-class observation, the 
behaviors of both teachers and students 
shifted dramatically in the predicted direc- 
tion in project classes. Such behavioral 
changes are, in themselves, significant, since 
the inner-city classroom has become 4 
pleasant, success-oriented environment and 
students appear willing, if not eager, learn- 
ers. However, the experiences of the first 
year of the project suggested that simply 
reducing the level of disruption and in- 
creasing task involvement did not guarantee 
changes in academic aptitude or achieve- 
ment (Thompson, et al, 1972). Conse- 
quently, in the second year, teachers were 
encouraged to reinforce evidence of aca 
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Ficure 4. Mean percentage of pupils on task for each treatment group at middle 
and elementary grade levels. 


demic achievement almost exclusively, once 
appropriate social behaviors were estab- 
lished. 

Academic aptitude. The mean IQ scores 
for the September and May administrations 
of the California Test of Mental Maturity 
and the mean gain for project and control 
classes at each grade level are presented in 
Table 1. Students in project classes out- 
gained students in control classes by a 
factor over 2 (for project, M gain — 5.98; 
for control, M gain = 2.51). In fact, project 
students at every grade level, with the ex- 
ception of the first, achieved greater IQ 
gains than control pupils. As indicated in 
Table 1, the most impressive change OC- 
curred at the fourth grade. The outstanding 
performance of this group may be due to 
the fact that 81% of these children were 
exposed to the technique for two consecutive 
years. Over the two-year period, the fourth- 
grade project pupils have gained 20 IQ 
points: from 85.69 in September, 1970, to 
105.56 in May, 1972. 

A 2 x 6 (Treatment X Grade Level) 
analysis of variance Was performed on gains 
in IQ. Project pupils gained reliably more 
than controls (F — 12.14, df — 1/602, p « 


01). There was some drop in the amount of 
gain from the first to the eighth grade (F = 
3.85, df = 5/602, p < 01). Finally, Grade X 
Treatment interacted (F = 5.90, df = 
5/602, p < .01), indicating the large reversal 
in gain at the first-grade level. Project stu- 
dents reliably outgained controls by specific 
comparison for third, fourth, sixth, and 
eighth grades. 


TABLE 1 
Mean Toran IQ (CALIFORNIA Test or MENTAL 
MATURITY) AT PRE- AND POSTTEST AND MEAN 
GarN AS A FUNCTION OF TREATMENT AND 
Grape LeveL (September to May) 


Grade Group " Pretest | Posttest Gain 
1 Project 35 | 98.60 | 98.49 | —.11 
Control 34 | 86.60 | 96.18 | 9.58 
2 Project 38 | 86.03 | 94.21 | 8.18 
Control 44 | 86.11 | 92.00 | 5.89 
3 Project 61 | 88.28 | 95.54 | 7.26 
Control 35 | 90.54 | 92.94 | 2.40 
4 Project 36 | 91.69 | 105.56 | 13.87 
Control 43 | 86.70 | 88.17 | 1.47 
6 Project 57 | 85.70 | 91.40 | 5.70 
Control 49 | 85.76 | 86.27 51 
8 Project 76 | 71.63 | 74.67 | 3.04 
Control | 106 | 73.86 | 74 .08 .22 
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Academic achievement. The effects of 
project treatment on reading achievement 
are presented in Table 2 for the second 
through the eighth grades. On the average, 
project pupils gained .69 years (based upon a 
10-month year) and this doubled the gain of 
.34 years made by controls for the 8-month 
interval. Further, project students gained 
more than controls at every grade level. 

A 2 x 5 analysis of variance was per- 
formed on gains in reading grade equivalent. 
Project pupils gained reliably more than 
controls (F — 167.60, df — 1/528, p « 
.001). As for IQ, the amount of gain de- 
creased with increasing grade (F = 53.64, 
df — 4/528, p « .001). Finally, Treat- 
ment X Grade interacted (F = 3.65, df = 
4/528, p < .05). Comparisons of project 
and control means at each grade level 
yielded reliable differences favoring project 
students at second, third, sixth, and eighth 
grades (p < .01). 

Children in first-grade classes were given 
the achievement tests only at the end of the 
year. However, since IQ scores were ob- 
tained in September and since project stu- 
dents scored higher on this pretest, an 
analysis of covariance was performed com- 
paring year-end reading scores for first- 
grade students with IQ as the covariate. 
This analysis indicated that project first 
graders had better reading skills at the end 


TABLE 2 
Mean Toran READING Grape EQUIVALENT ON 
CALIFORNIA ACHIEVEMENT TEST AT PRE- AND 
Posrrest AND MEAN GAIN as A FUNCTION 
OF TREATMENT AND GRADE LEVEL 


Resi 
Sr PUE x m Gain 
tember | May 
2nd Project 46 | 1.54] 2.72 | 1.18 
Control 40 | 1.46 | 2.11 .65 
3rd Project 61 | 1.87 | 2.48 .61 
Control 34 | 2.19 | 2.50 .31 
4th Project 46 | 3.28 | 3.90 .62 
Control 23 |2.74 | 3.31 57 
6th Project 57 | 4.55 | 5.05 .50 
Control 37 | 4.51 | 4.90 .39 
8th Project 77 |4.67 | 5.29 .62 
Control 117 |4.90 | 5.08 18 
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TABLE 3 
Mean TOTAL AnrTHMETIC GRADE EQUIVALENT AT 
PRE- AND POSTTEST AND MEAN GAIN as 
A FUNCTION OF TREATMENT AND 
GRADE LEVEL 


Arithmetic 
Eee | «s 
tember| May 
2nd Project. 46 | 1.45] 2.01 | 0.56 
Control 43 | 1.47 | 1.87 | 0.40 
3rd Project 64 |1.76 | 2.34 | 0.58 
Control 30 | 2.14] 2.56 | 0.42 
4th Project 43 | 3.58 | 4.26 | 0.68 
Control 50 | 3.29] 3.72 | 0.43 
6th Project 55 | 4.98 | 5.94 | 0.96 
Control 28 | 5.04 | 5.34 | 0.30 
8th Project 69 | 5.32]| 5.83 | 0.51 
Control 104 | 5.55 | 5.95 | 0.40 


of their first year than did controls (F = 
7.85, df = 1/62, p < .01). The mean raw 
scores (adjusted scores are in parentheses) 
in reading were 60.94 (59.22) for project 
pupils and 46.70 (48.71) for controls. These 
mean raw scores translate to 1.80 and 1.55 
in grade equivalent for project and control 
groups, respectively. 

Results for arithmetic achievement for 
Grades 2 through 8 are presented in Table 
3. Overall, project pupils gained .65 years 
in mathematics (based upon a 10-month 
school year) as compared to a .39 gain for 
control pupils. As for reading, project stu- 
dents outgained controls at every grade 
level. A 2 x 5 analysis of variance yielded 
reliable effects for treatment (F = 185.07, 
df = 1/522, p < .001), for grade (F = 20.76, 
df = 4/522, p < .001), and for Grade X 
Treatment (F = 9.00, df = 4/522, p < .01). 
Specific comparisons between project 8n 
control means were reliable at each grade 
level. However, there was a much larger 
gain by project pupils in the sixth grade. 

An analysis of covariance was performed 
on the year-end arithmetic scores of first- 
grade children with pretest IQ scores as the 
covariate. First-grade project children out- 
performed controls in mathematics (F = 
8.99, df = 1/62, p < .01). The mean rav 
scores (adjusted scores are in parentheses) 
in arithmetic were 59.34 (57.39) for project 
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students and 43.83 (46.11) for controls. In 
terms of grade equivalent, first-grade proj- 
ect pupils scored at 1.70 grade level and 
controls at 1.40 grade level. 


Discussion 


. The results of this study are promising. 
The authors believe a clear demonstration 
has been made that most, if not all, inner- 
city teachers can learn to use a positive 
contingeney management procedure to in- 
sure behavior control, accelerated academic 
achievement, and probably as a function of 
the latter, substantial tested IQ gain. Fur- 
ther, it appears that teachers can maintain 
use of such a procedure over extended pe- 
riods of time (two years for eight teachers 
in the present study) and that the tech- 
nique works as well or better for children in 
the second year than in the first. In other 
words, there is substantial evidence that re- 
sults of this contingency management pro- 
cedure are enduring and not produced by 
some halo effect. 

As Baer (1971) notes, research workers, 
including the present authors, have clearly 
shown that operant principles can be effec- 
tively applied in the classroom. However, 
previous investigators have not answered 
actuarial questions concerning the propor- 
tion of teachers that can be trained to use 
these principles or the proportion of chil- 
dren that can benefit from the program. The 
present study provides some preliminary an- 
swers to these questions. First, all 16 teachers 
participating in this project reduced their 
delivery of punishment and increased their 
delivery of reinforcement. At a subjective 
level, the authors, who visited these classes 
regularly, believe that all but 1 teacher 
learned to use these procedures effectively. 
Further, in all 16 classrooms, disruptive be- 
havior dropped dramatically, and more sig- 
nificantly, task involvement increased. Fi- 
nally, a simple count of the proportion of 
children showing any gain in reading from 
pre- to posttest indicated that 91% of the 
project children gained, whereas only 7296 
of the control children gained. Thus, con- 
tingency management appears to work well 
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for most inner-city teachers and is effective 
with most inner-city children. 

Informal observation also suggests many 
side benefits of the study, such as higher 
teacher morale, few if any disciplinary 
referrals by teachers to principals, and im- 
proved relations between school personnel 
and the community (as represented by the 
parents of children in the experimental 
classes). 

The authors believe that the procedures 
of this study offer real hope for inner-city 
education and, they think, for the education 
of all children.* 
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One- to three-word modifiers that were judged to have the effect 
of making the subject noun phrase in each of a series of sentences 
more denotatively specific were prepared. In Experiment 1, people 
who received sentences containing concrete modifiers recalled sig- 
nificantly more words than people who received unmodified sen- 
tences, provided that in the former case the modifier was part of 
the retrieval cue. In Experiment 2, people exposed to concretely 
modified sentences recalled significantly more words than people 
who had seen redundantly modified sentences. 


A striking fact is that the information em- 
bodied in concrete words is two or three 
times easier to learn than the information 
conveyed by abstract words (Paivio, 1969). 
Indeed, concreteness—abstractness, or “im- 
age evoking value," is the most potent 
determiner of the learnability of words that 
has yet been studied. The practical implica- 
tions of this knowledge are not readily ap- 
parent, however. The teacher whose task 
is teaching about zebras can be encouraged 
to believe that his/her students will easily 
learn because “zebra” is a concrete term of 
high image-evoking value. But suppose he/ 
she has the more difficult job of, for in- 
stance, teaching about regulations? What 
technique will inerease the learnability of 
information to be acquired in connection 
with abstract terms such as "regulations?" 

A plausible answer is to try to increase 
the conereteness of the abstract term by 
modifying it with conerete words. Consider 
this sentence: 


The regulations annoyed the salesman. 


Would it be learned more readily if modified 
as follows? 


The strict parking regulations annoyed 
the salesman. 


1 The author gratefully acknowledges the assist- 
ance of Valerie Koester, Claire Lieberman, Debra 
Sweet, Steven Sweet, and Peter Zych. 

* Requests for reprints should be sent to Richard 
C. Anderson, Training Research Laboratory, 226 
Education Building, University of Illinois, Urbana, 
Illinois 61801. 


Answering the general form of this question 
was the main purpose of the experiments 
reported in this article. 

Previous research on concretization has 
had mixed results. On the positive side, 
Yuille and Paivio (1969) and Montague and 
Carter (1973) have reported that vivid, 
concrete language facilitates learning from 
connected discourse. The present studies 
differed from these in one important respect: 
In the previous research, the concrete words 
with which the text was augmented became 
part of the to-be-recalled material, whereas 
in the experiments described in this report, 
the response was the same whether or not a 
concrete modifier was included. In other 
words, the present research aimed to show 
an indirect effect instead of a direct one. 

On the negative side, Levin (1972) found 
that children learned somewhat fewer con- 
cretely modified nouns (e.g., spotted turtle) 
than unmodified nouns (e.g., turtle). How- 
ever, Levin’s experiments differed in many 
respects from the present ones. In the first 
place, his nouns were already denotatively 
specific; that is, they were specific enough 
so that the referent of each word could be 
pictured in a line drawing. No more than 
one or two of the (subject) nouns employed 
in the present studies could be represented 
unambiguously in pictures. In the second 
place, Levin employed the technique of free 
recall. Conereteness-abstractness does af- 
fect free recall, but the strongest effects are 
obtained when the concrete element serves 
as the cue in a cued-recall task, which was 
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the arrangement in the studies reported 
herein. Furthermore, because free recall was 
required, subjects in Levin's study who 
received the concretized list had twice as 
many words to recall. This fact could ex- 
plain his negative result. 

A subsidiary purpose of Experiment 1 was 
to investigate the effects of variations in re- 
call instructions. Recent research has indi- 
cated that people frequently substitute 
semantically related words for the verbatim 
language of the sentences they are trying to 
recall (Anderson, 1974). The question is 
whether recall instructions given after the 
sentences have been presented alter the 
frequency of substitutions. If the proba- 
bility of a semantically related substitute 
word were to go down following verbatim 
recall instructions and go up following sub- 
stance instructions, this would imply that 
some sentences or sentence constituents are 
coded in memory in both surface and seman- 
tie form. No difference in proportion of se- 
mantically related substitutions as a func- 
tion of recall instructions would imply that 
there is a single memorial code for each 
verbal element. 


EXPERIMENT 1 
Method 


Subjects. Involved in the experiment were 47 
undergraduates, mostly women, for whom par- 
ticipation was a requirement in an introductory 
educational psychology course. Subjects were 
randomly assigned to conditions when they ap- 
peared for the experiment. 

[ Materials. Sixteen simple declarative sentences 
in the past tense were constructed. Each had a 
concrete Object noun and a general term for a 
subject noun. For each sentence, one- to three- 
word modifiers were prepared which were judged 
to have the effect of making the subject noun 
phrase more denotatively specific, that is, more 
concrete. Some examples of this were as follows: 


The (ivory chess) set fell off the table. 

ane (huge earth-moving) vehicle missed the 
og. 

The (remote television) control pinched his 

hand. 


Design. The two factors in the experiment were 
(a) type of sentence and retrieval cue and (b) 
type of recall instructions. One third of the sub- 
jects received both elaborated sentences and 
elaborated cues, one third received elaborated 
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sentences but unelaborated cues, and the remain- 
ing third received both unelaborated sentences and 
unelaborated cues. “Elaborated” sentences and cues 
included the words modifying the subject noun, 
whereas “unelaborated” sentences and cues did not, 
Within each of the groups described thus far, half 
of the subjects received verbatim recall instruc- 
tions and half received substance recall instruc- 
tions. Verbatim instructions stressed literal repro- 
duction of the sentences, whereas substance 
instructions indicated an answer would be counted 
correct if it contained the idea, or gist, of the 
sentence, whether or not identical words were used. 

Procedure. Subjects were run individually in a 
small, sound-deadened cubicle. Learning instruc- 
tions mentioned that a test was to follow but gave 
no details. The sentences, which were typed on 
5 X 8 inch white unlined file cards, were presented 
at an eight-second rate paced by beeps from a 
tape recorder, Prior to each presentation, the cards 
were shuffled. After one exposure of the sentences, 
the subject solved addition, subtraction, and multi- 
plication problems for 48 seconds to prevent recall 
from short-term and, probably, nonsemantic 
memory. Next came the recall instructions, either 
verbatim or substance. Finally, the test was 
presented. The subject noun or the subject noun 
phrase from each sentence served as the retrieval 
cue. The cues were typed on 4 X 6 inch white 
unlined file cards, which were shuffled before 
each use. The test was subject paced. The subject 
was instructed to give orally the entire sentence 
if possible but was also encouraged to give frag- 
ments when the rest of the sentence could not be 
remembered. 

Scoring. The recall protocols were scored for 
numbers of verbs and objects recalled verbatim, 
discounting changes in number, tense, determiners, 
and auxiliaries, Also scored were the numbers of 
semantically related words substituted in place of 
the verbs and objects in the original sentences. A 
word was counted as semantically related if it was 
à synonym, close superordinate, close cohypony™, 
or hyponym of the word it replaced. These cate- 
gories have been defined and illustrated elsewhere 
(Anderson, 1972, 1974). Several people scored the 
protocols. Disagreements were resolved in con- 
Íerence. Previous research has shown very high 
Interrater agreement with respect to whether à 
word is semantically related or unrelated (Ander- 
son, 1974). 


Results 


Table 1 contains mean proportions of 
verbatim, semantically related and total 
words recalled under the various conditions 
that prevailed in the experiment. The dat@ 
were analyzed in a 3 x 2 (Type of Sentence 
Cue Combination x Type of Recall In- 
struction) unweighted means analysis 9 
variance. (Unweighted means were used be- 
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cause one cell was missing a ease.) Type of 
sentence-cue combination had a significant 
effect when the dependent variable was 
number of words recalled verbatim (F = 
7.17, df = 2/41, p = .002), number of se- 
mantically related substitutes recalled (F = 
9.76, df = 2/41, p = .000), and total num- 
ber of words recalled (F = 12.40, df = 2/41, 
p = .000). Neither type of recall instruc- 
tions nor the Type of Recall x Sentence- 
Cue Combination interaction was significant 
in any analysis. An analysis in which the 
dependent variable was the aresin transform 
of proportion of semantically related to 
total words recalled revealed no significant 
effects. 

Newman-Keuls tests showed that the 
elaborated-sentence-elaborated-cue condi- 
tion was significantly (a = .05) better on all 
three measures than the elaborated-sen- 
tence-unelaborated-cue condition and sig- 
nificantly better on the semantically related 
words and total words measures than 
the unelaborated-sentence-unelaborated-cue 
condition. The elaborated-sentence-unelab- 
orated-cue and unelaborated-sentence-un- 
elaborated-cue conditions did not differ 
significantly on any measure. The detail 
of the data was generally consistent with the 
overall results. For instance, there were 
more total words recalled under the elab- 
orated-sentence-elaborated-cue condition 
than under the unelaborated-sentence-un- 
elaborated-cue condition for 15 out of the 
16 sentences. 

Of the 187 responses scored as seman- 
tically related substitutes for verbs or ob- 
jects in the original sentences, 56% were 
scored as synonyms, 17% as superordinates, 
15% as cohyponyms, and 12% as hyponyms. 
Discussion 

Experiment 1 gave rather clear evidence 
that concrete modifiers facilitate the learn- 
ing of sentences whose subject nouns are 
general terms. Thus, the results were con- 
sistent with the original notions about con- 
creteness and image-evoking value. But 
there is at least one other explanation of the 
data. Maybe any subject noun modifier 
would facilitate learning, perhaps by in- 
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TABLE 1 
Mean Proportions or Worps RECALLED 
Words 
Sentence cue 

Seman- 
\Verbatim| tically | Total 

related 
Elaborated-elaborated 48 .21 | .69 
Elaborated-unelaborated +26 .08 | .34 


Unelaborated-unelaborated | .38 .08 | .46 


Note. The total possible score was 32. 


ereasing orthographie or phonological dis- 
tinetiveness. The purpose of Experiment 2 
was to determine whether the functional at- 
tribute of the modifier was its effect, on the 
denotative specificity of the subject noun 
phrase. 

Subjects presented with elaborated sen- 
tences did poorly when they received un- 
elaborated cues. Apparently, a vague cue 
was inadequate for the person to retrieve the 
representation which had been stored when 
the sentence was learned. Probably what 
happened was that the person tended to give 
a different interpretation to the general term 
alone than when it was concretely modified. 
This was unfortunate from a practical per- 
spective for, to stretch a point beyond the 
data, the implication was that the concrete 
case employed to illuminate a generalization 
has to be reinstated if a person is to have 
access to his memory for that generalization, 

It is quite interesting that people who 
received verbatim and substance recall in- 
structions performed similarly on all mea- 
sures including, especially, the proportion of 
semantically related to total words recalled 
(F < 1.0). This must mean that people do 
not have both a coding for the literal surface 
form of a sentence and a coding for its 
meaning. Otherwise, they would edit their 
production when given verbatim recall in- 
structions so as to conform more closely to 
the original, whereas in all likelihood they 
would edit their output away from the origi- 
nal when given substance recall instructions 
so as to employ wording that seemed apt 
and tasteful. The data are consistent with 
the view that every sentence (which is 
learned at all) is represented in memory in 
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semantic form, though this theory is some- 
what strained to explain why such a high 
proportion of words are recalled verbatim 
(see also Anderson, 1974). In this view, it 
has to be assumed that verbatim words ap- 
pear in recall only when a person just hap- 
pens to select the same lexical items when 
decoding the semantic representation into 
language. Another view consistent with the 
data is that some sentences or sentence con- 
stituents are coded literally, whereas the 
rest are coded semantically. 


EXPERIMENT 2 


Method 


Subjects. The subjects were 28 undergraduates, 
mostly women, who participated to fulfill a require- 
ment in an introductory educational psychology 
course. 

Materials. Sixteen pairs of sentences were con- 
strueted. Within each pair, the sentences were 
identical except for one to three words which 
modified the subject noun. One of the sentences 
contained a concrete modifier of the same type 
employed in Experiment 1. The other entailed a 
redundant modifier, so called because it was judged 
to have little or no impact on the denotative 
specificity of the subject noun phrase. Consider 
the sentence, 


The official regulations annoyed the salesman. 


Most regulations are official. The class of official 
regulations is not much narrower than the class of 
all regulations. Other illustrations are as follows: 


The oil-pressure gauge was covered with dust 
vernu The measuring gauge was covered with 
ust. 

The obscene exclamation embarrassed the 
nun versus The excited exclamation em- 
barrassed the nun. 

The sports periodical provided the informa- 
tion versus The regular periodical provided 
the information. 


Design. The two factors in this experiment were 
(a) sentence list and (b) type of modifier. The 
same sentences, except for the modifiers, appeared 
in each list. Half of the sentences within each list 
contained concrete modifiers, half contained 
redundant modifiers. If a sentence included a con- 
crete modifier in the first list, the parallel sentence 
in the second list included a redundant modifier. 
In other words, with respect to the factor of 
principal interest—type of modifier—this was a 
within-subjects, or mixed-list, design. 

Procedure. Subjects were run individually in a 
small, sound-deadened cubicle. The sentences, 


ANDERSON 


which were typed on 5 X 8 inch white unlined file 
cards and randomized by shuffling before each 
use, were presented at an eight-second rate paced 
by beeps from a tape recorder. After one exposure 
to one of the two lists of sentences, the subject 
solved addition, subtraction, and multiplication 
problems for 48 seconds to prevent recall from 
short-term memory. Finally, the subject received 
the test. On each page of a 2% X 8Y2 inch answer 
booklet was mimeographed the subject noun 
phrase (always the entire phrase) of one of the 
sentences. The subject was instructed to write the 
rest of the sentence, all of it if possible, but any 
word or phrase he could recall if the whole sentence 
could not be remembered. Everyone received sub- 
stance recall instructions. To reduce the likelihood 
of systematic position or sequence effects, the test 
booklets were collated in four different random 
orders and were assigned to subjects about equally 
often at random. The test was subject paced. 

Scoring. The protocols were scored the same way 
as in Experiment 1. 


Results 


In Table 2 appear the mean proportions 
of words recalled. Analyses of variance 
showed that type of modifier had a signi- 
ficant effect on number of words recalled 
verbatim (F = 15.81, df = 1/26, p = .001), 
number of semantically related substitute 
words (F = 14.36, df = 1/26, p = 001), 
and total words recalled (F = 27.87, df = 
1/26, p = .000). 

List was nowhere a main effect (all Fs € 
1.0), but in every analysis except the one 
involving total words, the List x Type of 
Modifier interaction was significant. These 
interactions occurred simply because the 
substitution of semantically related words 
was more likely in one list than in the other. 
Previous work has suggested that the prob- 
ability of substitution is a function of the 
aptness of the wording of the original sen- 
tences and the availability of semantically 


TABLE 2 
Mean Proportions or Worps RECALLED 
Words 
el at oos E 
Modifier ea 
Verbatim | tically | Total 
relat 
Concrete .34 .21 .55 
Redundant .22 al 33 


Note. The total possible score was 16. 
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equivalent alternative wordings (Anderson, 
1974). Apparently the two lists differed in 
these or other relevant respects. 

Conerese modifiers led to better perform- 
ance than redundant modifiers in 12 of the 
16 sentences. There was little difference with 
the remaining 4 sentences. Of the 144 se- 
mantically related substitute words, 44% 
were scored as synonyms, 26% as super- 
ordinates, 17% as hyponyms, and 14% as 
cohyponyms. 


GENERAL DISCUSSION 


The data show that concrete modifiers 
strongly facilitate the learning of sentences. 
The general educational implication for the 
teacher, author, and curriculum developer 
is to be as specific and concrete as possible. 
The results were positive enough to en- 
courage research on other, more elaborate 
conerctization techniques such as the use of 
metaphor, analogy, and physical models to 
represent systems of abstract concepts. 

There was facilitation only when the 
entire, concretely modified subject-noun 
phrase served as the retrieval cue but not 
when the subject noun alone was the cue. 
Under the latter condition, it is quite likely 
that the subject noun was encoded differ- 
ently at the time of original exposure and at 
the time of testing (cf. Martin, 1968; 
Wicker, 1970). It was clearly shown in Ex- 
periment 2 that the effective variable was 
concretization and not merely the presence 
of modifying words. Bower (1971) has men- 
tioned, in passing, an unpublished experi- 
ment which also found no facilitation from 
redundant words on the stimulus side in a 
paired-associate task. 

Throughout this paper the terms concrete, 
specific, and vivid have been used inter- 
changeably. It should be noted that there 
is some indication that concreteness-ab- 
stractness and specificity-generality may be 
Separate, though of course correlated, di- 
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mensions (Paivio, 1971, p. 83). Possibly 
vividness is also a distinct factor, though it 
appears to affect learning in the same way 
as conereteness/specificity. Kirchner (1969) 
inserted “vivid” or “dull” adjectives into 
appropriate and identical slots in a narra- 
tive. She found that people who heard the 
vivid narrative recalled more nouns than 
those who heard the dull narrative but, 
surprisingly, there was no difference in the 
recall of the adjectives themselves. 
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SEVEN ASPECTS OF TEACHING CONCEPTS 


M. I. CHAS. E. WOODSON? 
University of California, Berkeley 


Seven steps (defining, identifying relevant attributes, identifying 

irrelevant attributes, listing exemplars, listing nonexemplars, de- 
scribing the domain of the concept, and using analogies) in the in- 
struction of concepts were isolated and their relative effectiveness 

examined by using each to teach verbal concepts (e.g., the ‘‘mean- 

ing" of Chinese characters) to college students. Subjects learned 

seven concepts at once in a within-subjects design. Instructions in- 

volving the definition or identification of the relevant attributes were 

found to be the most effective in terms of errors. Several kinds of 

generalization errors were isolated, and these were found to be asso- 

ciated with different instructional steps. 


Most concept-learning experiments have 
involved displaying to the learner a collec- 
tion of positive and negative instances from 
which he is to infer the concept being 
taught. On the other hand, when most of us 
set out to teach a concept to a learner in 
his natural habitat we attempt to take as 
much advantage as possible of the com- 
munication skills possessed by the instructor 
and the learner. This type of concept-learn- 
ing task may be called the instructional 
paradigm to distinguish it from the reception 
paradigm (e.g. Haygood & Bourne, 1965), 
in which the instructor determines a se- 
quence of positive and negative instances 
that are presented one at a time, and the 
selection paradigm (e.g., Bruner, Goodnow, 
& Austin, 1956), in which the learner selects 
which pattern comes next and all possible 
positive and negative instances are usually 
available. 

Several instructional steps can be identi- 
fied in the instructional paradigm situation. 
Among these are the (a) stating of a defini- 
tion, (b) instruetions intended to identify 
the relevant attributes of the concept, (c) 
instructions intended to identify the irrele- 
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i 
vant attributes, (d) showing of examples of 
the concept, (e) showing of nonexamples of 
the concept, (f) description of the domain 
of the concept, and (g) use of analogies to 
describe the concept. This experiment was 
performed to compare these seven instruc- 
tional steps in overall effectiveness in teach- 
ing concepts. 

Evaluation of different instruction meth- 
ods raises the question of amount of trans- 
fer. Classification practice may build up 
specific associations that appear to be | 
concept learning on a classification task but | 
lack the generality needed for a test of de- 
fining. Since a set to learn concepts facili- 
tates concept learning (Reed, 1946), college 
subjects who are used to studying concepts 
may show even more transfer to testing 
tasks quite different from the instructional 
tasks. The question of transfer can be in- 
vestigated by giving a variety of tests t0 
all subjects. 

This study proposes, therefore, to con 
struct four measures of concept learning, 10 
instruct concepts by seven alternative 1 
structional methods within the instructiona 
paradigm, and to compare achievements 0? : 


all four tests following the instruction. 
METHOD 
Subjects 


The subjects were 14 undergraduate student 
who responded to signs announcing a paid expe 
ment. The data from two subjects were drop 
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from the analysis because they scored unusually 
high and admitted to having been coached by a 
previous participant in the experiment. 


Design 


Each subject learned seven concepts by one of 
seven methods of instruction. The concept- 
method pairs were counterbalanced so that among 
the 14 subjects each pairing occurred twice. 


Task and Procedure 


Subjects were instructed individually by in- 
structions typed and drawn on IBM cards. Sub- 
jects were informed they would be paid lé for 
each point scored on a test with 280 possible points 
over the material covered. 

Seven concepts were taught at the same time. 
A unit of instruction consisted of a Chinese char- 
acter and an English word drawn or typed on an 
IBM card. Each successive block of seven units 
contained one unit on each of the seven concepts. 
Within each block, the order was randomized 
independently for each subject. The rate of pres- 
entation was approximately one per 10 seconds. 
During the first two blocks, while the instruction 
card was in view of the learner, the experimenter 
read aloud the instruction appropriate to the 
instructional condition. After the two blocks 
involving treatment presentations, four instances 
and four noninstances, each labeled as an example 
or nonexample, were presented in the next eight 
blocks. 

During training, no overt response was required 
of the subject, and the subject was instructed 
not to ask questions. After training there was à 
10-minute interval before testing during which 
subjects filled out a survey that was a part of 
another experiment, and this interval was followed 
by a 28-item test on the concepts taught. 


Concepts Taught 


The concepts taught were “meanings” of the 
Chinese characters wood, round, small, blue, hard, 
valuable, and eatable. Sets of English words that 
involved a common concept were prepared as 
Instances. 

Training instances were English words that 
described things that were examples or nonex- 
amples of the concepts being taught. For example, 
for the character wood, examples included Jumber, 
tree, forest, board, baseball bat, and branch. 

onexamples included plastie, flowers, grass, 
purse, pliers, and apple. 


Tnstructional Steps 


The instructional steps used were as follows: 

1. Definition: “This character refers to the 
round shape of objects.” 

2. Identification of relevant attributes: “The 
Meaning of this character has something to do 
with the material of which things are made.” 

3. Identification of irrelevant attributes: “The 
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meaning of this character has nothing to do with 
color.” 

4. Listing exemplars: “The following are things 
to which this character applies: most houses, 
most telephone poles.” 

5. Listing nonexemplars: ‘The following are 
things to which this character does not apply: 
fish, automobiles, and air." 

6. Describing domain of concept: “This char- 
acter applies to things which have a shape." 

7. Use of analogies: "The meaning of this 
character (wood) is to this instance (tree), like 
metal is to an automobile.” 


Measures of Learning 


Four measures of learning were used for each 
of the seven concepts. Three required written 
responses, while the fourth involved the classifica- 
tion of 10 not previously encountered instances as 
examples or nonexamples of the concept. 

1. Definition, The subject was asked to ‘‘ex- 
plain what the following character means." 

2. Exemplars. The subject was asked to give 
several examples of things to which the character 
applies. 

3. Nonexemplars. The subject was asked to 
give several examples of things to which the char- 
acter does not apply. 

4. Classification. The subject was asked to 
classify as examples or nonexamples a sample of 10 

ible instances (5 examples, 5 nonexamples) 
that the subject had not encountered in training. 

The order of the 28 test items (4 item types for 
each of the concepts) was randomly determined 
independently for each subject with the restriction 
that each block of 7 items contain all 7 concepts. 
The classification items were scored by counting 
the number classified correctly. The chance expec- 
tation of scores on this measure is about 5. 

The definition, exemplar, and nonexemplar 
items required some judgment in scoring. They 
were scored by a person experienced in concept- 
learning studies, but unfamiliar with the experi- 
mental conditions involved. All answers to a 
particular type of item were displayed to him at 
once, and he was asked to classify each answer on 
an arbitrarily chosen 0-10 scale of relative effec- 
tiveness with a mean of approximately 5. His 
classifications were reviewed a number of times 
before the scoring was determined. The experi- 
menter also examined the classifications and 
concluded there was close agreement between 
his and the rater’s judgment. 


RESULTS 
The concepts proved to 
ficult, with a mean score of 5.3 out of 10 
for the retention-transfer subtests. Table 1 
gives the means across subjects and con- 
cepts for each of the instructional steps on 


the transfer-retention measures. í 
A multivariate analysis of variance, 


be rather dif- 
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TABLE 1 
MEAN TRANSFER-RETENTION SCORE ON EACH OF 
THE MEASURES OF LEARNING FOR EACH 
INSTRUCTIONAL CONDITION 


Measure of learning 
Instruction step E S 
Defni- [Exemplar] Poplar | on 

Definition Ti 7.6 9.0 8.0 
Relevant attri- 

butes 8.4 7.8 8.6 8.6 
Irrelevant attri- 

butes 3.1 1.7 4.7 5.0 
Exemplars 7.4 8.0 5.3 6.5 
Nonexemplars 3.0 1.3 7.9 5.1 
Domain 6.2 7.5 6.5 7.2 
Analogy 6.8 | 60 | 5.8 5.8 


Concept X Treatment, with repeated 
measures on individuals and four measured 
variables, using Roy’s largest root criteria 
(Morrison, 1967), indicated the seven in- 
structional treatments differed (0 = .71, 
s = 4, m = 1/2, n = 345, p < .01). In- 
ferences about individual treatments and 
measures were made by ordering the means 
within measures and using individual ¢ tests 
with a = .01 distributed among all possible 
pairwise comparisons within each dimension 
by means of the Bonferroni inequality pro- 
cedure (Miller, 1966). 

The relative effectiveness of the several 
methods of instruction is summarized in 
Table 2. Methods of instruction are listed 
in order of effectiveness as measured by the 
retention tests, and methods that were not 
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distinguished statistically are enclosed in 
brackets. Overall, the specification of the 
relevant attributes or a definition was most 
effective. 

Examining the four measures of learning | 
for each concept a subject learned, it is 
possible to classify the learning of each 
concept into a number of categories on the | 
basis of the generalization apparent. These t 
include: correct generalization in which the 
instances: and noninstances are correctly 
classified; over generalization in which 
many noninstances are classified as in- 
stances; under generalization in which many 
instances are classified as noninstances; no 
generalization in which only the training 
instances and noninstances are recalled; 
and mis-generalization in which many im ? 
stances and noninstances are misclassified. 
Table 3 summarizes the generalization ob- 
served under each instructional condition. 

The classification of generalization errors 
used here was arrived at in a somewhat post 
hoe manner, that is, by examination of the 
outcomes and logical analysis considering 
these outcomes. In such a situation, statisti- 
cal tests of hypotheses about the occurrence , 
of these types of errors are not appropriate, 
as classifications arrived at may be due in 
part to the pattern of outcomes observed. 
It appears, however, from examination of 
the error frequencies, that certain generaliza- 
tion errors are associated with particular 
methods of instruction. For example, mM- 
struction about irrelevant attributes or non- 
exemplars appears to lead to over general- 


TABLE 2 
INSTRUCTIONAL CONDITIONS IN RANK ORDER AccorpING To EFFECTIVENESS FOR Eacu MEASURE 
or LEARNING 


Measure of learning 
Definition Exemplar Nonexemplar Classification 
Relevant attributes Exemplars Definiti i 
'an K efinition Relevant attributes 

Demin Relevant attributes Relevant Eius] Definition 
Exemp ars Definition Nonexemplars Domain 
B alogy 'omain Domain Exemplars 

omain 3 Analogy Analogy Analogy 
Irrelevant pea | Trrelevant attributes Exemplars Nonexemplars ] 
Nonexemplars Nonexemplars Irrelevant attributes | Irrelevant attributes. 


Note. Conditions in brackets are not distinguished statistically. 
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ization, while instruction about the domain 
appears to lead to under generalization. 


DISCUSSION 


Much of the tradition in the study of 
concept learning has been restricted to the 
study of Weigl-type problems with well- 
defined stimulus dimensions. Hovland (1952) 
and Hovland and Weiss (1953) argued that 
it was impossible to evaluate the informa- 
tion provided by instruction if the task did 
not involve a known number of attributes. 
This analysis has inspired a large number 
of studies using stimulus patterns with a 
small-number of discrete and obvious levels, 
studies that are referred to as concept learn- 
ing but whose generalizability and ap- 
plicability to human concept learning in the 
natural habitat are very suspect. The em- 
phasis of these studies has unfortunately 
been upon what the experimenter could 
detect was included in the instruction, but 
the critical factor is what the learner can 
detect and thereby exhibit by his behavior. 
In other words, instruction is not the ex- 
hibiting of information, but it is trans- 
mission to and reception by the learner. 
Reception is the essential characteristic, not 
display. 

Furthermore, the question is not really 
one of whether the experimenter can analyze 
the information content of the task, but 
whether the learner can do so in the in- 
structional situation. If the information 
content of the task is easily analyzed, 
learners may well use different learning 
strategies than if the information content is 
not open to immediate analysis. Differences 
in the information communicated by in- 
structional methods need not be measured to 
compare methods. Differences among in- 
structional methods, as measured by what 
the learner can do, are the appropriate 
measure of instructional effectiveness. 

In the present study, the instructional 
steps were probably not equivalent in terms 
of information provided, but the approach 
was to avoid concern with equating in- 
structional methods in terms of the amount 
of information available in the eyes of the 
experimenter, to equate instruction in terms 
of time and effort on the part of the learner, 
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TABLE 3 


INFERENCES REGARDING THE GENERALIZATION 
OBSERVED IN EACH INSTRUCTIONAL CONDITION, 
BASED on AN EXAMINATION or ALL Four 

MEASURES OF LEARNING 


Instruction step Correct | Over | Under | None | Miss 


Definition il 1 1 0 1 
Relevant attributes, 9 0 2 2 
Trrelevant attri- 

butes 0 9 2 it 4 
Exemplars 6 1 1 4 2 
Nonexemplars 0 7 2 0 5 
Domain 7 1 5 0 if 
Analogy 3 6 1 1 3 


and to consider any differences in amount of 
information communicated, as measured by 
what the learner achieves, the basis of com- 
paring instructional methods. 

The results indicated the seven instruc- 
tional steps were of different effectiveness 
and that these differences were detectable 
by relatively simple tests after a short 
period of instruction. It may be that the 
learning of several concepts at once ac- 
centuated these differences as compared to 
other experiments in which only one concept 
at a time was learned. 

Since the subjects in this study were 
college students, it was reasonable to infer 
that they were rather experienced concept 
learners. It may be that the relative im- 
portance of these instructional acts de- 
pends somewhat upon the subjects’ ex- 
perience in dealing with them. Such a 
situation was found in the case of dealing 
with negative instances. 

The measure of learning does seem to be 
related to the method of instruction, al- 
though the differences are not as striking as 
might be expected. Transfer from one 
method of instruction to dissimilar testing 
procedures may be expected due to the 
subjects being highly practiced concept 
learners who have received much training 
intended to facilitate transfer. 

These results imply that the identification 
of the relevant attributes (average rank of 
1.5 among the methods used, see Table 2) 
is the most effective instructional strategy, 
followed closely by definition (average rank 
of 2.125 among the methods used). This 
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may be due in part to the character of the 
task used, which may have been primarily 
a concept-identification task for most sub- 
jects. 

This study suggests a number of questions 
that are unanswered by the present data. 
Do persons differ as to which instructional 
methods are more effective? Are these dif- 
ferences great enough to make individual 
instructional prescriptions beneficial? What 
combinations of instructional methods are 
the most effective in terms of learner and 
instructor, time and effort? Are the dif- 
ferences in the instructional effectiveness 
among methods due to experience of the 
subject or the characteristics of the methods 
themselves? In the face of detected errors of 
generalization, which method of instruction 
is most effective? The method associated 
with low numbers of generalization errors 
of that particular type may be the most 
effective for remedial instruction. The 
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within-subjects design used in this study 
may well not be the most effective way of 
going about answering these questions. 
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TEACHING FACTS ABOUT DRUGS: 
PUSHING OR PREVENTING? 


RICHARD B. STUART? 


University of Michigan 


Nine hundred thirty-five seventh- and ninth-grade students in two 
suburban junior high schools were randomly assigned to experimental 
drug education or control groups. A 10-session fact-oriented drug 
education program was offered in two formats (student or teacher 
led) and with three sets of contents (lesser drugs only, major drugs 
only, or both sets combined). The program was evaluated through 
the use of a self-report measure of drug information, drug use, and 
attitudes relating to drug use. Results indicated that relative to con- 
trols, subjects receiving drug education significantly increased their 
knowledge about drugs, their use of alcohol, marijuana, and LSD, 
and their sale of the latter two drugs, while their worry about drugs 
decreased. Neither format nor content factors were shown to influ- 
ence the results of the program. When the interaction among drug use, 


knowledge, and worry was examined, it was shown that use tends to 


rise as a function of the combination of increased knowledge and 
reduced worry. This combination of factors was not sufficient as a 


predictor of drug use, 


however, suggesting 


the influence of other, 


untested factors. Within the limitations posed by several qualifica- 
tions, it is suggested that these findings support the notion that drug 
education may not necessarily be positive in its effect, indicating the 
need for precise measurement of program outcomes. 


f Se 


There is a growing belief that the use of 
some drugs such as alcohol and soft hal- 
lucinogens is increasing at all strata of 
society. In reponse to this, some jurisdic- 
tions have sought to control drug use 
through stricter law enforcement, while 
others have responded by decriminalizing 
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the possession and/or use of some drugs 
(Stachnik, 1972). Whatever the legal re- 
sponse to the presumed rise in drug use, 
there has been & widespread increase in 
reliance upon drug education as a preventive 
measure. At least 24 states now require drug 
education in the publie schools (National 
Commission on Marijuana and Drug Abuse, 
1972), with states such as Michigan re- 
quiring “education in health related topics 
with special reference to the nature of 
tobacco, alcohol and narcotics and their 
effect on the human system [Michigan 
State School Code, 1955]." 

Drug education has taken many forms 
involving varying permutations of settings, 
targets, methods, educators, and 
contents. Settings have ranged from. reliance 


gui and the measurement instrument pe in centers. The goals set typically call for 
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that minimize physical or emotional health 
hazards. The targets have been nonusers 
who are believed to be potential users, ex- 
perimental users, committed users, or the 
parents, teachers, employees, or friends of 
members of any of these groups. The meth- 
ods used have included fear induction, ex- 
hortation, authority- and entertainment- 
based appeals, role playing, encounter 
groups, and formal lectures (Richards, 
1970). The educators have been drawn from 
throughout the ranks of the professional 
and lay communities, from former drug 
users and from among the target populations 
themselves. Finally, the contents have 
ranged from reliance upon unbiased factual 
presentation of drug facts through the dis- 
cussion of the social and legal ramifications 
of drug use to the introduction of materials 
that reflect a very strong antidrug bias. 

With such a variety of programs, it is 

reasonable to expect that some programs 
have more positive results than others. As 
an optimal outcome, the programs might 
succeed in eliminating or temporizing drug 
use. As a midrange outcome, the programs 
might change attitudes or the level of drug 
information, which has no direct bearing 
upon drug use. As a negative outcome, the 
programs might exacerbate drug use by (a) 
providing students with sufficient informa- 
tion to facilitate the initiation of use; (b) 
providing students with facts that overcome 
the prejudices that had been inhibiting use; 
(c) desensitizing students about drugs 
through repeated discussion of drug con- 
cepts in environments such as schools, 
which have been traditionally disassociated 
from drug use; (d) leading students to think 
of themselves as potential drug users merely 
by virtue of their having been included in 
drug education programs; (e) changing 
attitudes that were the bastion of defense 
against drug use; or (f) occasionally in- 
cluding inaccurate or biased information, 
which undermines the credibility of the 
basic educational message. 

The risk of negative effects of drug educa- 
tion is suggested by evidence that shows 
that a relatively high level of knowledge 
about drugs is associated with higher levels 
of drug use. In a survey of adolescents in 
four Michigan communities, Stuart and 
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Schuman (1972; see also Swisher, Crawford 
Goldstein, & Yura, 1971) reported: 


The non-users of every type of drug were found to 
have lower drug information scores than did the 
users, the difference being statistically significant 
for every drug except alcohol . . . . Further, there 
was a strong relationship between past frequency 
of use of a drug and drug information scores for all 
drugs except alcohol [Stuart & Schuman, 1972, 
p. 139]. 


While these correlational findings do not 
demonstrate that increases in drug infor- 
mation necessarily cause or even catalyze 
increases in drug use—for they may follow 
rather than precede the initiation or in- 
tensification of use—they do show that drug 
information per se does not necessarily in- 
hibit use. 

Because of the possibility that drug 
education may actually be associated with 
intensified use, it is particularly unfortunate 
that the surfeit of drug education efforts has 
been associated with a paucity of program 
evaluations. Few states require any evalua- 
tion at all. In its exhaustive search, the Na- 
tional Commission on Marijuana and Drug 
Abuse (1972) did not report one evaluation 
procedure that adequately accounted for 
the effects of the countless programs that it 
cited (see also Ford Foundation, 1972). 
Those few states that do require any evalua- 
tion at all tend to be concerned with issues 
of secondary importance such as “the ques- 
tion of the age at which instruction should 
begin, how often programs should be 
repeated, and whether programs should be 
taught by medical and law enforcement 
personnel .... [California State Assembly, 
1967].” When use of drugs has been used as 
a dependent variable (e.g., California De- 
partment of Education, 1970), the con- 
sequences of drug education have not been 
encouraging. Nevertheless, there is little 
official questioning of the desirability of 
drug education, with the focus of the sparse 
evaluations likely to be placed on matters 
of secondary importance such as consumer 
appeal. 

The present study is an effort to measure 
the primary effects of drug education—i6s 
impact upon use of drugs and attitudes 
relating to use—as well as evaluating th? 
effects of differing instructional patterns 
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The goal of this program was the prevention 
or reduction of use among the junior-high- 
school-age targets. The settings were two 
junior high schools in an upper-middle-class 
suburban university community. The meth- 
ods included lectures by teachers and stu- 
dent presentations designed to communicate 
facts about the physiology and phar- 
macology of drug use in association with its 
legal, social, and psychological ramifications. 
The two teachers were women in their mid- 
twenties, one of whom was a registered 
pharmacist with extensive experience in 
counseling committed drug users, while the 
other was an experienced junior and senior 
high school teacher who had worked in 
inner-city schools. Both were casual in ap- 
pearance and relaxed in style. Three content 
divisions were used in this study. Some 
students received education about the lesser 
drugs—alcohol, tranquilizers such as Valium, 
marijuana, hashish, nicotine, and caffeine 
(Type A content); others received education 
about the major drugs—LSD and other hard 
hallucinogens, amphetamines, barbiturates, 
and narcotics (Type B content); while still 
others received education about all of the 
drugs (Type AB content). All data were 
collected by the two teachers, whether from 
experimental or control subjects. 

The dependent variables in this study 
were measured by a single instrument that 
assessed knowledge about drugs, self-report 
of present and past use and the sale of drugs, 
and three attitudes believed to be associated 
with drug use—worry about drug effects, 
acceptance of drug use as a nondeviant 
action, and alienation from more conven- 
tional sources of satisfaction. 


METHOD 


Subjects 


The subjects were 935 junior high school boys 
(55.5%) and girls (44.5%) of whom 63.5% served 
as experimental subjects and 36.5% served as con- 
trols, Five hundred and nine seventh graders were 
drawn from science and unified studies classes and 
426 ninth graders were drawn from civics classes. 
These required classes were chosen because they 
were formed on an alphabetical rather than an 
ability-interest basis, thereby affording a repre- 
sentative cross section of the seventh- and ninth- 
grade students in the school. These classes were 
then randomly assigned to either the experimental 
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or control conditions, and parents were duly noti- 
fied of the assignment of their children. Fewer than 
1% of the parents took advantage of the option to 
have their children withdrawn from the program 
prior to its inception, and only a small number of 
students were withdrawn after the program began. 
The first-semester classes provided pre- and post- 
experimental data and follow-up data, and the 
second-semester classes provided pre- and post- 
experimental data only. All data were collected 
during October, January, and April of the 1971- 
1972 academic year. 


Instrument 


A-single instrument measuring three clusters of 
factors provided the data for this investigation. 
The first portion of the instrument asks for self- 
report of the present and past use of seven classes 
of drugs: alcohol, soft hallucinogens, hard hallu- 
cinogens, stimulants, depressants, narcotics, and 
solvents. For example, in measuring present use, 
respondents are asked to indicate whether they do 
not use a particular drug, use it once or twice a 
year, once or twice a month, once or twice a week, 
or daily. These answers can then be quantified 
simply by assigning numerical values from 1 
(nonuse) to 5 (daily use). While not precisely 
identical to the frequencies stated in the question, 
this procedure nevertheless yields an approxima- 
tion of use via an ordinal scale. Furthermore, this 
scale is conservatively biased inasmuch as the 
transitions at the lower end of the scale (e.g., from 
nonuse to use once or twice per year) are of a lesser 
order than the transitions at the upper end of the 
scale (e.g., from weekly to daily use). 

These questions were previously used by Stuart 
and Schuman (1972) with apparently good results. 
The reliability of the questions was established in 
two ways. First, the questions were administered 
to 50 subjects within a seven-day period and 
yielded a test-retest reliability of .86. Second, 
Scores of the two questions containing “do not use 
this drug" as an alternative and of the four ques- 
tions containing “have not used this drug" as an 
alternative could be cross-checked for each of the 
seven classes of drugs investigated. The first set of 
questions yielded 7 comparisons (one for each 
class of drugs), while the second set of questions 
yielded 42 comparisons (all six permutations of the 
four questions for each of the seven classes of 
drugs). The average level of consistency in answer- 
ing these questions (i.e., the number of subjects 
giving the same answer to each pair of questions) 
ranged from 97.3% to 99.8%, reflecting a high level 
of reliability. 

The face validity of these questions could be 
examined only in terms of the conformity to a 
priori expectation of answers to varying questions. 
The six tests of face validity were as follows: 

1. It was shown that the frequency of use of 
drugs at least once declined steadily from 81.2% 
for alcohol to 1.3% for narcotics, with virtually no 


reported use of solvents. 
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2. No subject who reported a high level of pres- 
ent use of a drug (e.g., monthly, weekly, or daily 
use) did not also report a high level of past use of 
that drug (e.g., 10 or more times.) 

3. The percentage of subjects reported having 
sold marijuana was 7.4% as opposed to 3.3%, who 
reported having sold LSD. 

4. The number of subjects who reported having 
injected LSD, stimulants, depressants, narcotics, 
or solvents was under 1.5%. 

5. The number of bad drug experiences reported 
corresponded with the number of times in which 
subjects reported having used drugs in the past 
(i.e., for alcohol, marijuana, and LSD, the correla- 
tions were .32, .35, and .36, respectively [df = 173, 
p < 01). 

6. The frequencies of use of drugs at least once, 
reported in this study, closely corresponded to 
those found in a survey conducted in the same 
community one year earlier (Stuart & Schuman, 
1972); for example, both studies found that ap- 
proximately 74% of the seventh and ninth graders 
had used alcohol at least once, while marijuana 
and LSD were used by approximately 23% and 8% 
of these same groups, with use levels being slightly 
lower in the present investigation. 

Taken together these findings appear to bear out 
the face validity of these questions, with predic- 
tive validation depending upon data that were 
unavailable in this study (i.e., the number of 
police contacts, peer reports of drug use, or contacts 
with drug crisis treatment centers). 

TThe second portion of the instrument contains 

14 items measuring worry about drugs (e.g., *I am 
afraid of becoming addicted”), 5 items measuring 
the span of drug-related deviance (e.g., “As long 
as the buyer wants drugs, there is no reason to feel 
guilty about selling them”), and 5 items measuring 
drug-related alienation (e.g., “People use drugs 
because they find it hard to be happy other ways"). 
Each item was answered on a 5-point scale ranging 
from Strongly agree to strongly disagree. Low 
Scores indicate low worry and deviance tolerance, 
while high scores indicate a high level of aliena- 
tion. The Kuder-Richardson reliabilities for each 
of these scales were .92, .80, and -19, respectively, 
with corresponding test-retest reliabilities of .88, 
-81, and .61. Responses to the three scales were also 
significantly intercorrelated among the 757 sub- 
jects, whose responses to all three scales were com- 
plete at the time of the pretest. These correlational 
coefficients were —.43 for worry and deviance, .51 
for deviance and alienation, and —.54 for worry 
and alienation (p < .001). Thus, as expected, those 
who worried the most about drugs had the lowest 
deviance scores and those who felt themselves to 
be the most alienated had the highest deviance 
scores. (No a priori expectations were entertained 
for the association between worry and alienation.) 

Four partial validity checks were undertaken. 

In the first of these, the scores of each of the scales 
were intercorrelated with the subjects’ reports of 
their use of alcohol, marijuana, and LSD. These 
correlations were as follows: 
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Alcohol Marijuana 
Worry — .34 —.44 
Deviance —.82 —.48 
Alienation .20 .14 


All of these correlations were significant (df 
p < .05) and in the expected direction with 
correlations for alienation being quite low. In 
second validity check, the deviance scale respol 
of those who did and did not report that they 
sold marijuana were compared. The deviances 
had a range of 5 (a high deviance score) to 2 
low deviance score). The 82 subjects who repo 
having sold marijuana had a mean score of ff 
(SD = 13.39) indicating higher deviance, while 
774 subjects who reported not having sold m 
juana had a mean score of 15.506 (SD = 
indicating lower deviance. A test of these d 
ences yielded a ¢ of 7.2441 (df = 854, p < O 
Third, the deviance scores of subjects who 
ported that they could obtain alcohol, marij 
or LSD very easily, easily, or with difficulty o 
at all were compared. All three analyses of 
ance yielded significant results (F = 5.0957, df 
511, p < .001; F = 9.6136, df = 505, p < 4 
F = 9.9719, df = 493, p < .001, for the three dru 
respectively). In each instance, deviance toleral 
was higher for subjects finding it easy rather 
difficult to obtain drugs. For example, the a 
deviance score of those reporting ease in obtai 
LSD was 13.40, as opposed to 15.08 for thos 
porting great difficulty in obtaining the 
Finally, the extent of worry about drugs was 
correlated with the number of bad experie 
the level of .23, a low but significant (p < .05) 
relation. In view of these four sets of findi 
can be concluded that the three sets of at 
measures have at least defensible fact validity, 
The third portion of the instrument me 
the subject's knowledge of the pharmaco 
psychological effects, and the legal implicati 
drug use. Items in this series were multiple 3 
tiple-choice questions, for example, “An ovi 
of which of the following can cause death? 4 
hol? Methedrine? Morphine? Seconal?" or “D 
are metabolized (broken down) in the: small 
testine? liver? kidneys? pancreas?” This 7| 
scale was also used by Stuart and Schuman ( 
where its Kuder-Richardson reliability atta a 
level of .96 and a test-retest reliability of .85- 
In addition to these three substantive ate 
the instrument also contained questions né 
for sample descriptions such as grade, sex, 8cl 
ete. Finally, in keeping with the suggest 
King (1970), all questionnaires were com] 
anonymously and were identified for mal j 
purposes only by an alphanumeric code: the 
three letters of the respondents mother's mal 


these protocols could be matched for pre- and p! 
test data (594) and fewer still for pretest, posit 
and follow-up data, the analyses reported b 
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are posed exclusively upon the pooled group re- 
sponses of the full 935-subject sample. 


Procedure 


Drug education was offered one day a week in 
each participating class for a period of 10 weeks. 
Most individual sessions included a pretest and 
posttest of knowledge of the topics under discus- 
sion and were built around a combination of lec- 
ture, discussion, and audiovisual techniques. The 
topics for each of the sessions were as follows: 
pharmacology and physiology background, drugs 
and the nervous system, drugs and society, and 
discussions of each of the seven classes of drugs 
identified in the questionnaire. During the first 
semester, one half of the experimental classes were 
randomly assigned to either teacher- or student- 
led patterns of instruction. Both groups had the 
same three teacher-led general presentations (e.g., 
drugs and society). However, while the teachers 
continued to lead the instruction about particular 
drugs in half of the classes, in the other half of the 
classes students met in groups to develop educa- 
tional materials and then presented these ma- 
terials to the class. Because attendance and class 
participation levels were lower for student-led 
instruction and because there were no significant 
outcome differences between teacher- and student- 
led groups after the first semester (see Data Analy- 
sis), both were combined for the final data analy- 
sis, and all instruction during the second semester 
was teacher led. During both semesters, the con- 
tent of instruction was divided in a manner that 
permitted one third of the experimental subjects 
to receive the A, B, or AB program contents. 


Data Analysis 


i Data were analyzed using conventional analy- 
Sis of variance, covariance, t-test, and chi-square 
programs available through the MIDAS system of 
the Statistical Research Laboratory, University 
of Michigan. To evaluate the relative contribu- 
tions of knowledge and worry to the use of drugs, 
the Kullbach information or likelihood ratio was 
used (Kullbach, 1968, pp. 159-169; Kullbach, 


? The results presented are all derived from 
analysis of group comparisons. Because the ques- 
tionnaires of only approximately one out of five 
(first -semester) students could be matched on pre- 
test, posttest, and follow-up, no analyses were per- 
formed using these data. Analyses of covariance 
were performed for the four out of nine subjects 
whose pretest and posttest, questionnaires could be 
matched. These analyses all closely paralleled the 
results fror the full subject cohort with two excep- 
tions: As opposed to the larger group, change in 
worry about drugs was marginally nonsignificant 
(p < .06), and the change in deviance was not sig- 
nificant at posttest. To avoid redundancy these 
data are not reported here, but tables are available 
Upon request. 
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Kupperman, & Ku, 1962). This procedure permits 
a three-way analysis of (a) the independence of 
three variables, (b) the extent to which column 
variables are independent of layer variables, (c) 
the extent to which row variables are independent 
of layer and column variables, (d) the extent to 
which row variables are independent of layer var- 
iables, and (e) the extent to which row variables 
are independent of column variables for each level 
of the layer variable. If drug use were the column. 
variable, worry about drugs, the row variable, 
and level of knowledge, the layer variable, the 
information ratio would indicate the level of in- 
dependence of variables at all points of intersec- 
tion. The information ratio thus yields a higher 
level of analysis than would be available through 
the use of chi-square, yet its use is appropriate 
whenever chi-square can be applied. 

In the analysis that follows, data are presented 
for only three classes of drugs: alcohol, marijuana, 
and LSD. The remaining classes of drugs (stimu- 
lants, depressants, narcotics, and solvents) were 
used by fewer than 5% of the subjects at all times 
of testing and, therefore, did not yield sufficient 
variance to permit an analysis of group differences. 
Finally, all analyses in this report were based 
upon the number of subjects who completely an- 
swered the relevant question. Because it was not 
uncommon for subjects to omit one or more sec- 
tions of any given question, the number of subjects 
contributing to the various analyses was quite 
variable, although the number of protocols with 
missing data was quite consistent with the expec- 
tation for the present type of research and subject 


population. 
RESULTS 


Drug Education 

The major results of this study are sum- 
marized in Table 1 in which it can be seen 
that at pretest, seventh graders had 
predictably less information about drugs 
than did ninth graders, and experimental 
subjects at both grade levels had more drug 
knowledge than did controls at similar 
levels. Also, experimental subjects gained 
more knowledge than did controls, and the 
differences between experimentals and con- 
trols narrowed slightly but were nevertheless 
sustained at follow-up. Similar patterns 
were found for drug use and worry about 


For each of the variables in Table 1, an 
analysis of variance was performed at 
pretest, posttest, and follow-up times to 
determine the significance of the three-way 
interaction between condition (experimental 
versus control), grade (seventh versus ninth) 
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and sex. For the first five variables, all 18 
F ratios reached the level of significance 
(p < .05). Independent a posteriori ¢ tests 
were performed to locate specific significant 
differences between the means of each of 
these groups singly and in interaction with 
each other. This analysis revealed that ex- 
perimental subjects did not differ sig- 
nificantly from controls before undergoing 
the educational program, but they did show 
significantly (p < .05) more knowledge, 
greater alcohol, marijuana, and LSD use, 
and less worry than did controls at posttest 
and at follow-up. Through ¢ tests it was also 
shown that seventh- and ninth-grade sub- 
jects differed in each of these responses at 
all three testing times except for the follow- 
up assessment of drug information, when 
ninth-grade subjects scored higher, although 
not significantly so. 

The only significant differences not in- 
volving the experimental condition, grade, 
or their interaction were sex differences 
found in the lower alcohol use reported by 
girls at pretest and the greater worry re- 
ported by girls at follow-up. In a marginally 
significant three-way interaction, seventh- 
grade experimental girls were found to have 
higher knowledge than did their male peers, 
while ninth-grade girls outperformed boys 
on the knowledge inventory in both ex- 
perimental and control conditions, although 
not significantly. 

In the final two areas covered in Table 1, 
measures of deviance tolerance and aliena- 
tion, a less definitive pattern of results 
emerged. Significant F ratios were obtained 
for the deviance measure at pre- and post- 
test, but the F ratio at follow-up was not 
significant. At pretest, significant t tests 
showed ninth graders to be more accepting 
of drug use as nondeviant than were seventh 
graders (p < .001) and boys to be more 
accepting than were girls (p < .001). At 
posttest, ninth graders continued to be more 
accepting than were seventh graders (p < 
.001), but the sex difference was replaced by 
a significant increase in the acceptance of 
drugs by experimental subjects as opposed 
to controls (p < .01). None of these dif- 
ferences were maintained at the follow-up, 
however. While it is possible that the pro- 
gram had merely a short-lived impact upon 
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this attitude, it is more likely that the mid- 
semester passage of a local ordinance making 
marijuana possession a $5 misdemeanor un- 
doubtedly lessened the extent to which both 
experimental and control subjects considered 
marijuana use to be deviant. Because the 
alienation scale yielded no significant F-ratio 
differences, further analysis of data gener- 
ated by it was not undertaken. 

Figure 1 presents changes in the percent- 
age of experimental and control subjects re- 
porting that they had sold either marijuana 
or LSD before and after the educational 
period and at follow-up. The figure clearly 
demonstrates that more experimental than 
control subjects reported having sold drugs 
at pretest, with the differences accelerating 
at posttest for both drugs and at follow-up 
for marijuana only. Chi-square tests of the 
significance of these differences were com- 
puted for both drugs at each time of testing. 
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Ficure 1. Percentages of students who sold 
marijuana and LSD at pretest, posttest, and 


follow-up. 
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The pretest differences for marijuana and 
LSD were both nonsignificant (x? = 3.1121 
and .1756, respectively, df = 1), while 
posttest differences were significant for both 
drugs (x? = 7.6635 and 5.0276, respectively, 
df = 1, p < .03). At follow-up, there were 
significantly more marijuana sellers in the 
experimental group (x? = 5.8557, df = 1, p 
< .02), but the difference in the number of 
sellers of LSD was not significant (x? = 
2.2615, df = 1). ' 


Effects of the Type of Instruction 


Differences between the effects of the 
student-led instruction received by 128 
students and the teacher-led instruction re- 
ceived by 136 students during the first 
semester were evaluated by ¢ tests. There 
were no significant differences at pretest, 
estabilishing the initial equivalence of the 
groups. At posttest and at follow-up there 
were no significant differences between the 
two groups for the use of alcohol, marijuana, 
or LSD and no differences in the amount of 
knowledge, deviance, or alienation scores. 
Of the 14 ¢ tests, the only significant differ- 
ence (t = 2.0540, df = 230, p < .05) occurred 
in the lower level of worry among the peer- 
led classes (X = 14.09, SD = 3.76) at post- 
test. This difference was not maintained at 
follow-up, however. As this single difference 
could have occurred on the basis of chance 
alone, data from the student-led and 
teacher-led classes were combined and were 
subsequently analyzed along with data from 
the second-semester teacher-led classes. 


Effects of the Content of Instruction 


The effects of the three types of content, 
A, B, and AB, upon the use and attitude 
variables were evaluated at posttest and at 
follow-up. Because data relating to the use 
of major drugs were unavailable for analysis, 
it was possible to evaluate the effects of the 
A, B, AB contents on the use and attitude 
variables of A drugs only. Prior to instruc- 
tion, analyses of variance of subjects as- 
signed to classes in the three groups revealed 
no significant differences. Of the 14 analyses 
of the posttest and follow-up data, only one 
was significant. There was a significant dif- 
ference for alcohol use at follow-up (F = 
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4.2647, df = 190, p < .02) in which it was * 
found that those receiving AB content used 
significantly more alcohol (X = 2.87, SD = 
1.24, n — 63) than did those receiving only 
B content ( = 2.27, SD = 1.45, n = 70) 
but not significantly more than did those 
receiving only A content (X = 2.50, SD = 
1.58, n = 58). Because this single finding 
might have occurred by chance, it has to be 
concluded that the content of the drug cur- 
riculum tended to be nonspecific in its ef- 
fects; that is, changes in drug-use patterns 
and attitudes about use tended to occur in- 
dependently of the specific content of in- 
struction. 


Concomitants of Use 


| 

In order to explore some of the concomi- 
tants of subjects’ decisions to initiate ex- 
perimentation with drugs, the impact of 
knowledge and worry were explored. First it 
was found by chi-square analysis that when 
knowledge and worry were each divided into 
high, medium, and low categories, their in- 
terrelationship was not significant Gà = 
8.3652, df = 4, p < .10). Tables 2 and 3 
show the knowledge and worry scores of 
subjects reporting use of alcohol, marijuana, 
and LSD. It can be seen that for marijuana 
only a significantly higher percentage of 
users as opposed to nonusers is found to have 
high knowledge, while significantly more users 
of all three drugs had less worry than non- 
users. It is also obvious, however, that from 
one-fifth to one-third of the users of all three 
drugs were in the lowest knowledge category» 
and from one-fourth to one-half of those 
whose worry about drugs was in the lowest 
category were, nevertheless, not drug use 
"Therefore, neither high knowledge nor low 
worry alone appears to be necessarily p 
dictive of drug use. It is therefore importan 
to explore the possible interaction of thes? 
two factors as potential predictors of use. 

The interaction between drug use, infor | 
mation, and worry was assessed at follow-up 
using the Kullbach information ratio. Fr 
low-up data were used on the assumption | 
that they offered the best opportunity bie | 
assess residual effects of the program. Tal d 
4 demonstrates that (a) use, worry, 95. 
knowledge are not independent and, ther® 
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TABLE 2 


LEVEL or INFORMATION ABOUT DnuGs AMONG SuBsects WHo Do anp Do Nort Use ALCOHOL, 
MARIJUANA, AND LSD j 


Alcohol Marijuana LSD 
Level of information Do not use Do use Do not use Do use Do not use Do use 
n % | n % n % n % n % n % 
Low 
(X = 28.96, SD = 7.74) 94 | 39.7 | 185 | 33.5 | 165 | 26.4 | 33 | 20.9 | 259 | 35.4 | 19 | 35.2 
Medium 
o SD - 2.59) 77 | 32.5 | 185 | 33.5 | 180 | 28.8 | 36 | 22.8 | 246 | 33.7 | 16 | 29.6 
igh 
(X = 51.66, SD = 4.55) 66 | 27.8 | 183 | 33.1 | 281 | 49.9 | 89 | 56.3 | 226 | 30.9 | 19 | 35.2 
Chi-square 
(df = 2) 3.3048 6.6263* .5383 
*5 < .05. 


fore, significantly covary; (b) only among 
LSD users do worry and knowledge covary; 
(c) the 2 X 2 tables formed by dividing users 
of alcohol, marijuana, and LSD according to 
whether they are high or low in worry and 
knowledge all yield significant differences; 
(d) level of knowledge alone is insufficient 
for predicting use; and (e) use can be pre- 
dicted from an interaetion between low 
worry and high knowledge, with high worry- 
high knowledge not being predictive of use. 


Discussion 


Before discussing the results of this study, 
seven caveats should be noted. First, it 
should be borne in mind that the data for 


this research were drawn from anonymous 
questionnaires and that use data were 
wholly self-reports. Anonymous question- 
naires were selected for two reasons. First, 
it was believed that they were the only 
means through which students would 
honestly answer questions about their il- 
legal use. Second, the anonymity of re- 
sponses precluded their being subpoenaed 
in any court proceedings. Self-report ques- 
tionnaires were used because they were far 
more practical than peer or police reports or 
urine, blood, and other physical analyses 
(which would pose a serious civil rights 
dilemma), which would have been alterna- 
tive means of determining use. It is, of 


TABLE 3 
LeveL or Worry ABouT Dnvas AMONG Sussects WHo Do AND Do Nor Use ALCOHOL, MARIJUANA 
AND LSD 
Alcohol Marijuana LSD 
Level of worry Do not use Do use Do not use Do use Do not use Do use 
" % ^ % n % n % " % n % 


Low 
(X = 10.05, SD = 7.74) 
edium 

HS = 15.81, SD = 1.49) 


64 | 26.0 | 203 | 34.5 
63 | 25.6 | 203 | 34.5 


179 | 27.0 | 98 | 57.5 
238 | 35.8 | 63 | 37.1 


234 | 30.1 | 33 | 56.9 
241 | 31.0 | 23 | 39.7 


igh 

(X = 1855, SD = 2.61) — |119 | 48.4 | 183 | 81.1 | 247 | 37.2 9| 5.3|302|38.9! 21 3.4 
1-Squari 

(af = 2) 22.512* 83.319* 31.977* 


*p < 001. 
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TABLE 4 


INFORMATION Ratio ANALYSIS OF DRUG Use As A FUNCTION or Worry AND KNOWLEDGE ABOUT DRUGS 
AT FOLLOW-UP 


ij LS 

Source df aos, * Dern) (n Ei 
Use: worry: knowledge 4 19.0084* 51.5921* 44.8164* 
Worry: knowledge 1 1.9875 1.3417 23.1437* 
Use: worry, knowledge 3 17.0209* 50.2504* 21.6727* 
Use: knowledge 1 .2002 1.6498 1.7777 
Use: worry/knowledge 2 16.8207* 48 .6006* 19.8950* 
Use: worry/low knowledge 1 2.6529 2.8514 4631 
Use: worry/high knowledge 1 14.1678* 45.7492* 19.4319* 

*p < .001. 


course, possible that anonymous self-report 
answers, cloaked by their nonaccountability, 
could be capricious in addition to being 
subject to biased recall. Despite these risks, 
practicality demanded the use of both ex- 
pedients, but their threats to the validity of 
the data should not be forgotten. 

The second qualification worth noting is 
that the results of this investigation may not 
generalize beyond the work of the two 
teachers who delivered a specific curriculum 
to students in two grade levels of a domi- 
nantly upper-middle-class, academic com- 
munity. Without replication by other edu- 
cators, using different materials in a broad 
array of settings, the results reported here 
must be regarded as little more than sug- 
gestive. 

Third, these findings should be considered 
in light of possible response biases, which 
might have accounted for some of the ob- 
served differences. All data were collected 
from experimental and control subjects by 
the two teachers. It is quite possible that 
subjects in the experimental group who had 
10 contact hours with the drug educators by 
the time of posttest and follow-up might 
have been more honest in their responses 
than were the controls with whom the 
teachers were relatively unfamiliar. This 
could account for the differences that ap- 
peared following the initial testing. How- 
ever, at pretest, experimental subjectsshowed 
higher drug use, knowledge, or scores on at- 
titudes conducive to drug use. This would 

suggest that a response bias was established 
merely by virtue of assignment to a drug 
education class. lending some credence to 


the possibility that merely becoming the 
target of drug education could intensify the 
possibility that drugs might be used. In 
future studies, it is important to avoid the 
possible confounding of results, which might 
have occurred here by arranging for the 
collection of evaluation data by researchers 
who were not directly associated with the 
instructional program. Furthermore, the 
fact that subjects were exposed to the in- 
strument as many as three times might also 
have introduced a bias in the results, which 
could have led both experimental and con- 
trol subjects to admit to higher levels of use 
and sale. While the results that were ob- 
tained closely correspond to the survey data 
taken from comparable populations in the 
same community one year earlier, inclusion 
of a posttest-only condition could have ob- | 
viated this as a possible problem. 

Fourth, the conclusions of this research do 
not assess the impact of the program upon 
the patterns of use by subjects who have | 
taken amphetamines, barbiturates, or nar- 
cotics. It is plausible that while participa- 
tion in the program might have hastene 
some subjects’ experimentation with, alco- 
hol, marijuana, or LSD, other participants 
in the same program might have diminished 
their use of the first three classes of drug 
Data to answer this question are unavailable 
in the present study because of the insu | 
cient number of users of these drugs foun 
in the junior high school sample. The re 
search was undertaken in junior high schoo 
because of a belief that it would be mor 
beneficial to attempt to interrupt drug b 
earlier rather than later in the careers © 
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users, and beeause the school system in 
which the study was conducted regularly re- 
quires drug education at this academic level. 
In the interests of a more comprehensive 
evaluation, however, future researchers 
might be advised to select subject popula- 
tions known to represent a broader spec- 
trum of drug users. 

Fifth, it is important to bear in mind that 
the subjects in this study were relatively 
slight users of aleohol, marijuana, and LSD. 
Inspection of Table 1 reveals that of the 
three drugs in question, average use levels 
both before and after the educational pro- 
gram were at one to two times per month 
for alcohol and one to two times per year for 
marijuana and LSD. Compared with reports 
of drug use in other localities, these are 
modest levels. Changes in use were likewise 
of a low average level. Therefore the findings 
of this research may also not generalize to 
populations that include comparatively 
larger numbers of committed users. 

Sixth, the follow-up period in this research 
was a brief four months. It is possible that a 
longer follow-up period might have revealed 
More positive changes. However, inspection 
of Figures 1 and 2 suggests that sale and use 
of drugs by experimental subjects were still 
rising at the time of follow-up, creating the 
possibility that a longer follow-up period 
might show even more negative changes. 

Finally, the results of this study cannot be 
generalized to all forms of drug education. 
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The present effort was exclusively devoted 
to imparting drug-related information. Pro- 
grams that are oriented to promoting value 
change, to enhancing social or academic 
functioning, or to other indirect approaches 
to drug use may have entirely different ef- 
fects than do those encountered in the pres- 
ent investigation. 

With these qualifications in mind, three 
principal conclusions were drawn from this 
research. First, it was concluded that drug 
education may not only impede the use of 
drugs, it may actually exacerbate drug use. 
As a means of cross-checking the data re- 
ported earlier, subjects were asked at pre- 
test, posttest, and follow-up whether their 
use of drugs was greater than, equal to, or 
less than it had been three months earlier. 
While 5.4 96 of the experimental subjects and 
3.7% of the control subjects reported esca- 
lating drug use at pretest, these percentages 
rose to 13.0% and 4.1%, respectively, at 
follow-up. Subjects’ self-assessment thus 
appeared to corroborate judgments drawn 
from other descriptive measures. Threats to 
the validity of this conclusion based upon 
the historical fact of passage of the local 
marijuana ordinance were assumed to be 
controlled by the use of random assignment 
of subjects within the experimental design, 
and the contamination of experimenter- 
subject interaction was hopefully lessened 
after the interval of the follow-up period. 

Related to this first conclusion is the ap- 
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parent fact that participation in a drug edu- 
cation program might lead to a dispersion of 
effects in other behavioral and attitudinal 
areas. For example, inasmuch as it was 
found that sellers of drugs had significantly 
greater (t = 4.19, df = 814, p < .001) knowl- 
edge than had nonsellers (X = 42.35, SD = 
14.88,n = 67; X = 36.98, SD = 9.59,n = 749, 
respectively) and that they also had signifi- 
cantly less (t = 12.49, df = 923, p < 001) 
worry about drugs than had nonsellers (X 
= 12.49, SD = 10.68, n = 72; X = 15.829, 
SD = 11.37, n = 853, respectively), changes 
mediated by drug education might well 
eventuate in effects far beyond the expecta- 
tions of theggirug educator. In light of this 
possibility, 1& is wise for drug educators to 
carefully consider the wisdom of undertaking 
any program that might disturb decisions 
made at the natural choice points of the 
paths leading away from or toward drug use. 
The second important conclusion reached 
by this study was that classroom instruction 
can lead to a significant increase in the level 
of students’ knowledge about drugs. It has 
been found that students undergoing drug 
education learned significantly more than 
did controls who did receive some drug ed- 
ucation through other less concentrated pro- 
grams in the two schools, through drug facts 
frequently discussed in mass media, on the 
street, or through any literature that they 
may have sought on their own initiative. 
More important, however, is the finding 
that neither the format nor the content of 
the instructional program appeared to in- 
fluence the rate of knowledge acquisition. 
Student-led discussions, which paradoxically 
require considerably more time in planning 
and skill in execution, offered little if any 
advantage over more conventional teacher- 
led programs. This result is consistent with 
the findings of others (e.g., Swisher, Warner, 
& Herr, 1972) who found no significant dif- 
ferences following even more disparate pro- 
grams. Furthermore, general knowledge 
about drugs increased despite the fact that 
one-third of the group was educated only 
about alcohol, marijuana, and LSD, while 
another third was educated only about am- 
phetamines, barbiturates, and narcotics, 
with the third subsample receiving education 
about all six classes of drugs. Unfortunately, 
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the drug information measure used in this 
research assesses general information and 
does not lend itself to a dichotomization of 
content paralleling the divisions in instruc- 
tional content. Future researchers might be 
advised to plan for such an evaluation by 
matching units of instructional content with 
sectors of the evaluation instruments. 

The third major conclusion of this re- 
search was that the association between 
drug education and drug use may not be a 
simple one. Drug education was observed to 
lead to greater knowledge about drugs, but 
increased knowledge alone was not predic- 
tive of intensified use. One of the important 
mediating variables in the chain leading 
from information to use appears to be the 
level of worry about bad drug effects. But 
high knowledge does not necessarily lead to 
low worry and low worry does not neces- 
sarily lead to high drug use. Therefore other 
factors may also be influential. In the pres- 
ent sample, data from subjects who indicated 
that they had experienced one or more bad 
drug effects were analyzed to isolate factors 
associated with decisions to continue or dis- 
continue drug use following these experi- 
ences. Among users of alcohol, marijuana, 
and LSD, 18.7% (n = 109), 23% (n = 41), 
and 47.9% (n = 35), respectively, reported 
having had bad experiences. Of these, 23.9 % 
of those who had bad experiences with alco- 
hol discontinued use as did 17.1 % of mari- 
juana users and 37.1 % of LSD users. There- 
fore bad experiences with LSD were clearly 
more likely to be associated with discon- 
tinuance than were bad experiences with the 
other two drugs for which there is somewhat 
greater social support for use. 

Looking further into possible influences 
upon drug use patterns, in each instance, 
discontinuers had a slight one point or less 
advantage on drug information, while con- 
tinuers had under one-half point less worry 
and up to one and one-half more bad drug 
experiences (as would be predicted from 
their longer periods of use). None of these 
differences were significant nor were the dif- 
ferences in deviance and alienation. While 
these numbers are small, the results do indi- 
cate that the decision to continue or discon- 
tinue drug use, and presumably to refrain 
from or initiate use, is controlled for this 
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population by factors other than those iden- 
tified in the present research. 

There are a great many places to look for 
the conditions controlling drug use. Some 
are doubtless related to situational factors 
such as drug availability and patterns of 
peer use, the richness of alternatives to drug 
use mediated by parents and teachers, and 
the type of health and risk-taking education 
which the student may have undergone. 
Another influence may be the drug experi- 
ence itself. For many, patterns of the use of 
most drugs tend to be mild and of short dura- 
tion (National Commission on Marijuana 
and Drug Abuse, 1972, Appendix 1, pp. 
257-258; Stuart & Schuman, 1972, pp. 90— 
117). When Stuart and Schuman (pp. 136- 
188) asked subjects their reasons for dis- 
continuing use, *No further need for this 
experience" was chosen significantly more 
often than such alternatives as fear of legal, 
physical, or psychological consequences. 
Finally, factors relating to brain chemistry 
and other physiological characteristics may 
influence the decision to search for or con- 
tinue to use selected drugs. 

Drug education alone should not be ex- 
pected to reverse the impact of these multi- 
ple determinants of drug use, despite the 
zeal with which many drug educators have 
responded to community requests for action. 
The attempt to educate the citizenry rather 
than to change the controlling conditions is 
a familiar response to social crises in con- 
temporary society. Regrettably, educational 
efforts relating to the use of auto seat belts, 
smoking, alcoholism, racial integration and 
Preservation of the environment have been 
all-too-slow in achieving their goals. It is 
likely that drug education may be similar in 
its results. Therefore, special attention 
should be given now to drug education re- 
search that compares the effectiveness of 
drug information as opposed to health edu- 
cation or to a program that contrasts educa- 
tional programs on one side and efforts to 
Correct situational forces conducive to drug 
use—for example, sterile academic environ- 
ments, unreinforcing home experiences, and/ 
or a paucity of constructive behavioral al- 
ternatives in the community—on the other. 
Furthermore, the outcome measures for 
these comparative studies should be fully 
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triangulated to include an assessment of 
drug use and factors believed to be related 
to drug use in addition to any consumer 
evaluation of the program that might be 
deemed necessary. Only in this way can the 
critical questions about the efficacy of drug 
education be answered. And as these ques- 
tions are answered, so, too, will be questions 
about “the effectiveness of education itself 
as a tool for countering social disorganization 
and decay [Richards & Langer, 1972, p. 
48]." 
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Immediate and two-week retention were studied as a function of three 
levels of text readability and two levels of inserted postquestion 
difficulty. The inserted postquestion treatment was modified to permit 
review of text after question answering. A traditional control group 
was required to read without marking lesson pages; a second control 
group was permitted complete freedom. Both inserted postquestion 
treatments produced significantly inferior acquisition of content 
incidental to the inserted postquestions for two lessons having below- 
average readability. For an average readability lesson, only the difficult 
inserted postquestions produced lower acquisition. Treatment differ- 
ences diminished to nonsignificant levels on two-week retention. 
Learning was correlated with anxiety and self-confidence in the two 


lower-than-average readability lessons but not in the average one. 


Numerous studies have investigated the 
effects of adjunct questions on learning from 
prose text. A set of widely held conclusions 
is that inserted postquestions may raise the 
learning of (a) specific details asked about 
by the questions (Rothkopf, 1966), (b) de- 
tails categorically related to the questions 
(Rothkopf & Bisbicos, 1967), and (c) in- 
formation not directly related to the ques- 
tions (Rothkopf, 1966). When a lesson con- 
tains more information than a student can 
learn during a given study period, specific 
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questions surely enhance question-related 
learning by directing and focusing attention 
and by stimulating rehearsal. The intriguing 
experimental result has been that incidental 
learning is not significantly reduced as & 
consequence of inserted postquestion treat- 
ments; therefore, incidental learning alone 
was investigated in this study. 

Possible bases for the effects of adjunct 
questions have been suggested by Frase 
(1970) : "Questions are motivational stimuli. 
"They have arousal and associative outcomes 
[p. 346].” Arousal may be related to drive 
level, and theorists have long held that the 
relationship between learning performance 
and drive is curvilinear and that it is 8 
function of task complexity (Yerkes & 
Dodson, 1908). Hypothetically, arousal 
level varies with perceived question diffi- 
culty, and task complexity varies with text 
readability. To enable a test of the hypothe- 
sis that learning is a complex interactive 
function of such text and question char- 
acteristics, two sets of inserted postquestions 
having significantly different difficulty 
levels (produced by manipulation of dis- 
tractors for multiple-choice items) were 
varied factorially against a lesson written 
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at three levels of readability (produced by 
the manipulation of lexicon and syntax). A 
variety of aptitude measures, described 
below, were gathered and correlated with 
test results to aid in the interpretation of 
treatment effects. 

A second question of this study concerned 
the practical utility of conclusions based on 
previous research dealing with the effects of 
adjunct questions. Standard control groups 
have been instructed to read passively. 
However, observation of the study habits of 
college students indicated that passive read- 
ing is atypical; students may instead be ob- 
served to underline, highlight, write notes, 
and outline (normative questionnaire data 
were collected to estimate frequencies). Stu- 
dents in one control group of this experiment 
were, therefore, directed to study according 
to their idiosyncratic habits. A second con- 
trol group was given the traditional direc- 
tion to read only. 

The external validity of research on prose 
study techniques also requires testing of 
the experimental techniques as they might 
actually be utilized by students. However, 
In contemporary research on the use of 
adjunct questions, the dominant technique 
that has been employed for theory devel- 
9pment is one that students would surely 
disregard if a course grade or graduation 
Were contingent on achievement. Specifi- 
cally, Subjects have been prevented from 
looking back at the lesson once they have 
Seen the postquestion; they have also been 
Prevented from reviewing the lesson prior 
to administration of the criterial test. A 
Second problem confronting attempts to 
Seneralize from the existing literature to 
applied practice concerns the fact that ex- 
Perimenters have neither explained the pur- 
Pose of the inserted questions to their sub- 
Jects, suggested how the questions might be 
used to improve study effectiveness, nor ex- 
Plicitly described the relationship between 
the inserted questions and the retention test. 

hese problems were dealt with in the 
Present study. 

Experiments designed to improve or 
Ad new instructional techniques must 
3 50 employ study materials similar in char- 

cter to those actually in use, or external 
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validity is damaged. Furthermore, it may be 
speculated that improved instructional 
techniques are most needed for course mate- 
rials that students find diffieult and unin- 
teresting. Given these considerations, an 
effort was made to select a lesson that stu- 
dents would find boring and hard to study. 


MrrHOD 


Lessons 


The introductory section of a college-level 
mathematies text was selected. It, contained a de- 
scription of the abstract, deductive character of 
modern mathematics, a sketch of its history, and 
the authors’ philosophical perspectives. The read- 
ing difficulty level was obviously high because of 
difficult vocabulary and syntax. Of a 68-sentence 
total, 29 sentences contained 25 or more words. 
The text contained 1,687 words, and the average 
number of words for all independent clause units 
(or simple sentences) was 22.0. 

Two versions of the original text were prepared 
by breaking long sentences into shorter ones and 
by replacing uncommon terms with more familiar 
synonyms. In addition, a concrete example was 
introduced to clarify an abstract discussion. A 
moderately revised version contained 1,874 words 
and its independent clause units averaged 18.4 
words. A highly revised version contained 1,961 
words and averaged 14.9 words per clause. Here- 
after, the original lesson is referenced as having 
low readability (that is, as being very difficult), the 
moderate revision as having moderately difficult 
readability (moderately difficult), and the major 
revision as having average readability (average 
difficulty). Subjective scaling data and results for 
comprehension testing which support these de- 
scriptions are reported. 7 

The lessons were collated into 11-page booklets 
consisting of a cover page containing general ex- 
perimental directions, a second page explaining 
the particular study technique, and a 9-page text, 
with the exception that booklets containing in- 
serted question treatments had 9 additional pages 
for the questions and a blank page placed between 
each text page and its associated question. 


Treatments 


Four treatments were employed: relatively easy 
inserted postquestions, relatively difficult. inserted 
postquestions, passive reading, and idiosyncratic 
study. 1 

Inserted postquestions. The student was in- 
structed to study each lesson page until confident 
that he could answer a question on its contents. 
It was explained by the directions that the “ques- 
tions are designed to serve you as check points” 
and that answers were not given for two reasons: 
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(1) the answers are in the lesson material just 
read, and you may look back if you have any 
doubts concerning the correct answer, (2) previ- 
ous research has shown that most students 
simply look for the answers provided rather than 
studying. If you feel reasonably confident about 
the answer to each question, then you have a 
sign that your progress is adequate. On the other 
hand, if you are not sure and have to keep look- 
ing back, then you have a sign to slow up and 
study harder. 


Although students were permitted to check their 
answers, they were required to respond on the data 
sheets before turning back and were barred from 
erasing to preserve this data for analysis. 

It seemed necessary to divide the text into 
relatively small sections so that limitations im- 
posed by immediate memory capacity would not 
make the study technique unmanageable. With 
sufficiently small amounts to be learned for each 
postquestion, it was hoped that students would 
be encouraged to rehearse in preparation for the 
unknown question and that they would thereby 
improve retention. Based on this consideration, 
the original text was divided into nine sections 
with a median length of 199 words and with the 
division points placed carefully to avoid breaking 
continuity in thought (divisions were at paragraph 
boundaries in eight instances). The section length 
of about 200 words was also chosen because it 
corresponded to the interval that Frase (1968) 
had found optimal for raising retention on a com- 
bined test of incidental and repeated questions. 
Both revised lessons were divided at the same 
points as the original lesson. 

A single multiple-choice question stem was 
written in the knowledge or low-level compre- 
hension domains of the Bloom (1956) taxonomy 
for each of the nine sections. The questions may 
also be described by Anderson’s (1972) categories 
as containing one verbatim item, four transformed 
verbatim items, and four transformed paraphrase 
items, These questions were specifically, rather than 
randomly, selected to prevent the occurrence of 
any spurious patterns. Two sets of adjunct ques- 
tions were formed by manipulating the semantic 
relatedness of the distractors (three). One set of 
adjunct questions was organized to contain all 
of the easy items and the other set was organized 
to contain all of the hard ones. The adequacy of 
this manipulation was directly tested by exam- 
ining the item difficulty indices calculated from 
data provided when the students recorded their 
answers on data sheets during the experiment. 

Passive reading. Students worked according to 
the following instruction: “The method you are 
to use is the common study technique of reading. 
You may study in any manner that you please; 
only do not use your pen or pencil as a study aid.” 

Idiosyncratic study. The direction was to “study 
just as you do typically. If you like to underline, 
or pen notes in the margins, or just read straight, 
then do so now.” 
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Questionnaire 


Immediately after studying the experimental 
lessons, but prior to taking the test, the students 
were given either a blank sheet on which to write 
opinions or a questionnaire which asked about (a) 
interest in the lesson topic, (b) readability level 
of the text, and (c) merit of the experimental 
study method. (The questionnaire was given before 
testing to avoid possible artifactual effects; for 
example, those who studied inserted questions 
might have reacted negatively when finding them 
missing from the test.) Ratings for these questions 
were secured through use of 5-point rating scales. 
The students were also questioned about their 
study habits. 


Tests 


Acquisition and retention. A multiple-choice 
test composed of 28 items, with four choices per 
item, was given immediately after the question- 
naires were collected. Two weeks later, the same 
test was administered without prior warning to 
students having studied the very difficult lesson. 
Students having studied the moderately difficult 
and the average lessons were given 10 minutes to 
restudy their lesson booklets prior to retaking the 
test; this procedure was designed to explore the 
possibilty that note writing and underlining of 
lesson material facilitates review activities and 
to improve external validity, on the assumption 
that most students study or review before taking à 
test. Twenty minutes were provided for both im- 
mediate and retention testing. 

Transfer study. The possibility that the in- 
serted questions could provide direct positive 
transfer to the criterial test, which contained items 
similar in type to the adjuncts but did not con- 
tain any of them, was investigated. A group of 21 
students first took the test without any prepara- 
tion. After this pretest, they were given handouts 
containing the adjunct question stems and an- 
swers and were then retested with the handouts a5 
an available aid. 

Test fidelity. The criterial test had been con- 
structed directly from the original lesson. To check 
the possibility that test information was not 
adequately preserved or represented in the two 
revisions, a separate experiment was conducted. 
Forty students in two course sections of under- 
graduate educational psychology were randomly 
assigned to one of three groups, each of which 
studied one of the three lessons for 25 minutes. 
These students were led to believe that they woul 
be given a closed-book test to insure that they ha 
acquired a thorough acquaintance with the lesson 
contents. However, an immediate open-book test 
(20 minutes completion time) was given instead to 
minimize the possible effects of writing style on 
retention and to reflect more accurately the 
available text information. By accepting the a 
sumption that text information was preserve 
mean test scores can also be used to indeX 
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changes in lesson comprehensibility brought about 
by the text revisions. 


Participants 


Students of the undergraduate educational psy- 
chology course winter and spring quarters,-1971-72, 
at Southern Illinois University, Carbondale, partic- 
ipated to fulfill a research requirement. Data were 
collected from approximately 700 students, but 
certain losses occurred. It was discovered after 
data collection that 15 booklets for the very diffi- 
cult lesson, hard-question treatment were im- 
properly collated. A number of students for whom 
immediate retention data had been collected were 
absent during the second-week retention-testing 
periods, Attitude data were missing for a small 
percentage of students, but American College Test- 
ing Program, English subscale data were missing 
for about one third of the sample. To improve the 
power of significance tests, the largest sample 
available for each analysis has been used. As a 
consequence, means, standard deviations, and 
sample sizes are reported for the several analyses 
presented in Tables 1, 2, and 3 of the Results 
section. It may be noted that the sample sizes of 
the idiosyncratic treatment for the very difficult 
and moderately difficult lessons are relatively low; 
less data had been collected here since the booklets 
for the other treatments could be reused. It should 
also be observed that more data were collected for 
the very difficult than the moderately difficult 
lesson and for the moderately difficult than the 
average lesson, in accordance with the major in- 
terests of this experiment. 


Aptitude Measures 


. Verbal ability scores—American College Test- 
Ing Program, English subscale—were obtained 
from administrative records for the majority of 
Students. A variety of attitude measures were ob- 
tained by the use of a battery administered dur- 
ing the first class meeting in both quarters. The 
Specific scales were as follows: Alpert-Haber (1960) 
Achievement Test Anxiety, Facilitating and De- 
bilitating subscales; Internal-External Locus of 
Control (Rotter, 1966); Dogmatism (Rokeach, 
1960); Social Desirability (Crowne & Marlow, 
1964); and intellectual self-confidence, a scale de- 
ot by the investigator which is described 
NW. 

The intellectual self-confidence scale is based 
on a construct defined as follows. Phenomenologi- 
cally, the belief that one has the capacity to suc- 
ceed at tasks demanding intellectual effort is the 
Central fact of intellectual self-confidence. Theo- 
Tetically, the strength and scope of this belief is a 
unction of the individual's reinforcement history. 
À history of actual, or perceived, success would 
e & positive conviction and failure a negative 
j “regard. In addition, if it is assumed that suc- 
ess constitutes positive reinforcement, then cues 
aSsociated with intellectual activity themselves 
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acquire reinforcing properties (ie. become posi- 
tive secondary reinforcers). Thus, the successful 
individual develops a liking for intellectual ac- 
tivity. Furthermore, since the successful individ- 
ual has been reinforced for his own efforts, self- 
reliance also has been shaped, The construct 
definition for intellectual self-confidence, therefore, 
incorporates reference to three behavioral tend- 
encies: (a) expectation of success, (b) attraction to 
intellectual tasks, and (c) self-reliance. The 33 
items of the intellectual self-confidence scale have 
been written to provide a nonreactive measure of 
these three component tendencies. Scale reliability 
is approximately .75, Cronbach «. Validation in- 
formation is reported in Kirby and Hiller (1973) 
and Cronbach and Snow (1974). ji 


Procedure 


Treatments were assigned randomly to stu- 
dents within intact classrooms (28 in the main ex- 
periment) with the restriction that each class was 
to have only one of the three lessons. The experi- 
ment was conducted during regular class hours. 
The goal of this experiment, “to develop informa- 
tion on the effectiveness of different study tech- 
niques," was described to students on the cover 
of the lesson booklets. The different treatments 
were not, however, described to the classes. To 
foster academic motivation, the instructor in- 
formed students that he would be apprised of any 
students scoring too low (below chance level) for 
credit. The final paragraph of the lesson cover 
stated that there would be 25 minutes of study 
time and that the lesson contained about 1,800 
words: “This means that you will have time to 
read it twice over—in other words you have enough 
time to study.” (Only a negligible number of stu- 
dents were observed to be studying actively when 
the 25-minute period ended, so that unavailability 
of time was not a factor. Questionnaire responses 
further indicated that lack of study time did not 
differentially affect study behavior among treat- 
ment groups.) The students were informed just 
prior to getting their study directions that the 
test was designed to "test your retention of facts 
and your understanding of the ideas presented in 
the lesson." After reading the second page of their 
booklets, which contained directions for the study 
techniques, the students began to study. 


RESULTS 


Transfer Study 


The group tested to determine if the in- 
serted questions provided direct positive 
transfer to the test averaged 7.5 on the pre- 
test and 7.9 on the posttest (t = .24, dfi 
20). Hence, the criterion test measured 
retention of information not directly related 
to the inserted questions. 
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Test Fidelity 


The groups given the open-book test to 
determine if test-related information had 
been adequately preserved in the two re- 
vised lessons performed as follows: very 
difficult lesson, M = 15.9, SD = 3.5,n = 11; 
moderately difficult lesson, M — 17.6, SD — 
2.5, n — 18; and average lesson, M — 19.5, 
SD = 2.7,n = 11. The test means were sig- 
nifieantly different (F — 4.63, df — 2/37, 
p < 05). In addition, the difference between 
results for the moderately difficult and the 
average lessons was significant (t = 2.07, 
df = 27, p < .05) under a nondirectional 
test. Given the open-book nature of the test, 
results here may be interpreted as providing 
an objective measure of lesson comprehen- 
sion, thus verifying the effectiveness of the 
readability manipulations. 


Inserted Question Difficulty 


The easy inserted questions achieved an 
average difficulty level across the three 
lessons of 69%, while the hard questions 
averaged 53% (F = 65.3, df = 1/323, p < 
001). Neither lesson differences nor the 
interaction between question treatments and 
lessons approached significance. 


Immediate Retention 


Analysis of immediate test results (see 
Table 1) demonstrated that idiosyncratic 
study and passive reading were similar to 
each other but superior to the inserted ques- 
tion treatments. Overall analysis of variance 
for the experiment determined that treat- 
ment effects were highly significant (F = 
11.6, df = 3/663, p < .001).3 Lesson effects 
also were found to be significant (F = 6.86, 
dj = 2/663, p < .01). The interaction be- 
tween lessons and study treatments was not 
significant (F = 1.08, df = 6/657) for the 
experiment as a whole. However, inspection 
of immediate retention means suggests that 
lessons and treatments did interact for the 
inserted questions, but that uniform per- 
formance for passive reading and idiosyn- 
cratic study obscured the interaction anal- 


* The analysis of variance was performed by the 
multiple regression technique and program de- 
scribed in Kelly, Beggs, and McNeil (1969). 
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ysis. In fact, separate analysis for the 
inserted question data does indicate a tend- 

eney toward interaction (F = 2.78, df = 

2/321, p < .06). 

In light of the increase in comprehension | 
scores across the lessons, it is interesting to 
observe that immediate retention increased | 
from the very difficult lesson (M = 125, 
SD = 3.1) to the moderately difficult lesson 
(M = 13.5, SD = 3.8), but that immediate 
retention for the average lesson (M = 13.5, | 
SD = 3.3) did not increase further. : 

| 


Two-Week Retention ‘ 


Retention data (see Table 1) demon- | 
strated no significant effects for the ve 
difficult lesson analyzed separately or fo 
the revised lessons analyzed together. Since 
results for the immediate test indicated that 
the inserted questions lowered the retention 
of information incidental to the questions 
and that the two control groups learned 
about equally well, two-week retention data 
have also been analyzed separately for the 
control and experimental groups using the 
two revised lessons. Results for the inserted 
question groups did not demonstrate signifi- 
eant differences in retention, but the con- | 
trols did differ significantly across the 
moderately difficult and the average lessons; 
retention was higher for the moderately 
difficult lesson (t = 2.63, df = 150, p < .05). 
Parallel to this result, the controls did not 
show a significant loss of retention in the 
moderately difficult lesson, while controls 
using the average lesson did drop. 


Aptitude-Retention Correlations 


Correlations between aptitude and im- 
mediate retention scores are displayed in 
Table 2, and summary statistics are dis- 
played in Table 3. In the very difficult 
lesson, the measure of academic confidence 
(intellectual self-confidence scale) Cor- 
related significantly with scores of students 
having passive reading and thserted question 
treatments and was, in fact, most highly co! 
related with performance. by the hard-ques- 
tion group. Furthermore, in the moderately 
difficult lesson, the intellectual self-co" 
fidence scale correlated only with scores for 
the hard-question group, while in the ayera£ 
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lesson, it correlated with none. The facilitat- 
Ing test anxiety measure correlated with 
both control group results and with those for 

. hard questions in the very difficult lesson, 
but was not correlated for any other treat- 
ment conditions. Debilitating anxiety pro- 
duced correlations that mirrored those for 
the intellectual self-confidence scale. Verbal 
ability was generally correlated with per- 
formance for the various treatments and 
showed a tendency in all three lessons to be 
more predictive of performance for passive 
reading than for idiosyncratic study (see 
Footnote b in Table 2). 

„Results for Dogmatism, Social Desira- 
bility, and Internal-External Locus of Con- 
trol scales were not generally significant 
and, therefore, are not reported here. 

Two-week retention-aptitude correlations 
for the identical students employed in the 
immediate retention analyses are displayed 
in Table 2. Here the intellectual self-con- 
fidence scale emerged as a significant cor- 
relate for the idiosyncratic treatment of the 

_ Very difficult lesson, while the correlation for 
| hard question dropped from .43 to .26. In 
_ the moderately difficult lesson, correlations 

for passive reading and easy question were 
| raised to significant values, while the cor- 
relation for hard question dropped from .41 
to .18. Facilitating anxiety maintained sig- 
nificant correlation in the. very difficult 
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TABLE 1 
Test Resuits 
Lesson 
Treatment Very difficult Moderately difficult Average 
M SD LI M SD n M SD n" 
Immediate retention 
Passive reading 13.1 3.3 86 14.4 3.9 68 13.7 
: d ; T : $ j 3.2 
Idiosyncratic 13.3 2.1 59 14.4 3.7 | 41 1.2 | 3.3 5 
Easy questions 12.1 3.1 81 12.2 3.3 60 13.4 3.3 34 
Hard questions 11.4 3.1 64 13.1 3.9 55 12.2 3.0 33 
Two-week retention 
Passive reading 11.1 3.1 71 13.7 4.9 54 1.6 | 4.1 | 27 
Idiosyneratie 11.1 2.4 47 13.7 4.0 33 11.9 3.2 38 
Easy questions 10.6 3.3 69 11.9 | 3.8 47 11.4 | 3.5 | 24 
Hard questions 10.2 2.9 53 12.9 4.3 46 12.2 | 3.3 | 25 


lesson only for passive reading but rose 
to significance for the idiosyncratic study 
treatment of the moderately difficult lesson. 

Interrelatedness of the aptitude variables 
is shown by the following correlations (N — 
427): Intellectual Self-Confidence x De- 
bilitating Subscale = —.37; Intellectual 
Self-Confidence x Facilitating Subscale = 
.23; Intellectual Self-Confidence x Ameri- 
can College Test = .31; Debilitating Sub- 
scale x Facilitating Subscale = —.39; De- 
bilitating Subscale x American College 
Test = —.27; Facilitating Subscale x 
American College Test = .20. 

Data were graphed and examined for dis- 
ordinal aptitude-treatment interactions. 
Nonsignificant but consistent patterns sug- 
gested that low-aptitude students performed 
best under idiosyncratic directions, while 
high-aptitude students performed best under 
passive reading in the very difficult lesson 
(the general lack of significant correlations 
in the two revised lessons led to a nonsig- 
nificant F test of interaction for all lessons, 
treatments, and aptitude variables combined 


in one analysis). 


Questionnaire 


Readability. Subjective ratings for lesson 
readability demonstrated that the text 
manipulations were effective (F = 20.6, 
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TABLE 2 
IMMEDIATE- AND Two-WEEK-RETENTION-APTITUDE CORRELATIONS 
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» 


Correlations 
Lesson/treatment ne Immediate retention-aptitude 2-week retention-aptitude 
ISCS FA DA ACT-E^ " ISCS FA DA 

Very difficult 

Pairs reading 63 .80* | .32*| —.25* | .28 | 43 .82* | .26* | —.18 

Idiosyncratic 44 | —.02 .82* | —.10 .45* | 38 .88* | .06 | —.11 

Easy questions 61 .81* |-.04 | —.21 .66* | 45 17 .02 | —.13 

Hard questions 51 -43* | .33* | —.43* | .30 37 .20* | .23 | —.41* 
Moderately difficult 

Passive reading 51 .21 -10 | —.20 .97* | 37 -30* | .17 | —.05 

Idiosyncratic 30 18 .22 | —.23 .56* | 23 .93 .39* | —.26 

Easy questions 38 .12 |-.11 .07 .25 27 RTI .05 

Hard questions 44 .4* | .1 | —.45* | .29 36 .18 .00 | —.28 
Average 

Passive reading 25 .09 Bt .07 ES 21 | -.11 .91 | —.14 

Idiosyncratic 34 12 12 | —.25 .22 31 —.06 14 | —.15 

Easy questions 22 .26 .06 | —.14 .50 15 .00 .98 | —.24 

Hard questions 5 4 A d £ 4 - 


Note. Abbreviations: ISCS = intellectual self-confidence scale; FA = Facilitating subscale and DÅ 
Debilitating subscale of the Alpert-Haber Achievement Test Anxiety Scale; ACT-E = Americ 


College Testing Program, English subscale. 


* Sample sizes refer to all statistics except those for ACT-E, i 
and immediate retention scores, based on all data available were 
follows (sample sizes given in parentheses) : very difficult: passive reading = .46* (50), idiosyncratic’ 

easy question = .60* (52), hard question = .35* 
43* (44), idiosyncratic = .33 ns (30), easy question = .18 ns (35), hard question = .29 ns (42); am 


^ Correlations between ACT-E 


.83* (43), 


average is the same as in the above table. 
* p < .05, nondirectional test. 


df = 2/400, p < .001) ; treatment and inter- 
action effects were not significant. 

Lesson interest. Interest ratings varied 
significantly across lessons such that the 
more readable the lesson, the higher the 
interest (F = 4.68, df = 2/400, p < .01); 
treatment and interaction effects were not 
significant. 

Method liking. The students indicated 
strong differences in their liking for the 
different study techniques (F = 12.5, df = 
3/400, p < .001) ; lessons and interaction did 
not show significant effects. Idiosyncratic 
study was rated most favorably, followed 
by easy question, hard question, and passive 
reading, in that order. 

Normative study habits. Approximately 
47% of the students reported that they 
usually write notes and/or underline when 
studying, but only 17% of those in the 
idiosyncratic conditions wrote any notes, 
and most of these stopped writing notes by 
the third page; 84% of the students reported 


(41); moderately difficult: passive reading 


that they sometimes or usually underline 
when studying, and 73% of the students in 
the idiosyncratic condition did underline. ^ 


Discussion 


The inserted postquestion procedure, 8 
modified to permit text review, failed 
enhance retention of content incidentally 
related to the inserted questions. With 
exception of relatively easy questions 
serted in the average readability lesson, 
postquestion procedure actually tended i 
depress immediate retention. A possible 
planation may be found in the frequency 9! 
question pacing, which was roughly one 
question every 200 words. Frase (1968 
found that postquestions inserted in a 2,0 
word text every 10 sentences tended to lov 
incidental learning, but this questioning rat 
also produced the highest retention scores OF 
a combined test of incidental and inse d 
question learning and thus appeared to pre" 
vide an appropriate rate for this experimen 
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TABLE 3 
Means AND STANDARD DEVIATIONS FOR ÁPTITUDE MEASURES 
ISCS 
Lesson/treatment Lu e ES S 
M SD M SD M SD Li M SD 
Very difficult 
Passive reading 63 112.1 13.3 25.0 5.7 
Idiosyncratic 44 | 112.8 | 14.4 | 25.6 | 5.0 | 26.9 is 3 a0 ha 
Busy questions 61 | 110.9 | 11.5 | 24.4 | 5.0 |28.6| 5.9 | 45 |21.3| 4.9 
ard questions 51 113.6 12.0 i 1 À 
Mlerat aiy diffieult 2 24.2 5.1 |27.0| 5.6 37 |20.4| 4.1 
Passive reading 51 111.8 14.3 24.4 6 
; i i 0 |271] 7. 
Idiosyncratic 3o | nze | 107 | 25 | 46 |277] 60 | 2 |228| 45 
DM questions 38 115.1 15.7 23.2 6.0 |28.3| 6.6 27 20.8 3.7 
ard questions 44 113. k ; 
tee 3.6 15.5 25.9 6.5 |27.6| 7.2 36 |21.0| 3.6 
Passive reading 25 107.5 11.7 23.4 4 
r ; : ' 3 8 | 27.9] 49 | 21 |19.0| 3. 
i ioeynoratio 34 112.4 12.2 23.7 5.4 |26.8| 5.3 31 i : 
* E questions 22 111.3 14.3 23.9 5.4 |28.2] 4.3 15 |21.5 3.1 
ard questions 21 11.5 10.6 23.0 4.8 |30.0| 4.3'| 16 | 20.8) 3.1 


pee: Abbreviations: ISCS = intellectual self-confidence scale; FA = Facilitating subscale and DA = 
ilitating subscale of the Alpert-Haber Achievement Test Anxiety Scale; ACT-E = American 


College Testing Program, English subscale. 


* Sample sizes apply to all statistics except those for ACT-E. 


Frase attributed the incidental learning 
depression to disruption in the text’s con- 
tinuity. However, in the context of the pres- 
ent experiment, this interpretation must 
be regarded as equivocal, since the text was 
divided so as to preserve integrity of com- 
munication, and students could review to 
overcome any difficulties. The pattern of 
Tesults does suggest two factors that may 
have contributed to the depression, and 
these are discussed below. 
Immediate retention achieved by the hard 
pation group studying the moderately dif- 
cult lesson and by both inserted question 
Stoups studying the very difficult lesson was 
Significantly correlated with confidence. An 
experiment by Means and Means (1971) 
Uggested how confidence (generalized ex- 
ye ation of academie success) may have 
ee performance. They found that stu- 
m s with low grade point averages (and 
Sealy low success expectation) per- 
fe worse in a course as the result of 
had eing led to expect low achievement; in 
m rast, students with high grade point 
i Tages performed above control levels 
: e being informed that they had low 
me for the course. For the very difficult 
moderately difficult lessons, the ques- 


tions may have achieved effects comparable 
to the negative manipulation by Means and 
Means, since performance on the inserted 
questions was rather poor (on the average, 
53% correct for hard questions and 69% for 
easy ones). Additionally, the significant 
negative correlation of debilitating test 
anxiety with immediate retention for the 
hard-question groups studying the very 
difficult and the moderately difficult lessons 
implies that distracting emotional behaviors 
were evoked (see Wine, 1971). 

Examination of results for the average- 
readability lesson showed that neither 
achievement test anxiety nor academic self- 
confidence predicted immediate retention. 
Since insertion of questions occurred at 
precisely the same points for both hard- and 
easy-question treatments and the questions 
were of the same cognitive type, the fact 
that hard questions alone produced perform- 
ance significantly inferior to the idiosyn- 
cratic control (Newman-Keuls test, p < 
05) suggests that question difficulty was the 
primary factor that contributed to this 
deficit. The mechanism underlying the 
deficit might be that students who experi- 
enced difficulty with the questions con- 
centrated on question-related material dur- 
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ing review. In the moderately difficult lesson, 
immediate retention scores for the easy- 
question group were not correlated with con- 
fidence or anxiety; hence, the significantly 
inferior acquisition performanee by the 
easy-question group (p « .05, compared 
against the idiosyneratie control by a New- 
man-Keuls test) may also be attributed to 
review activities that concentrated attention 
on question-related information. (Reversal 
of performance by the two inserted ques- 
tion groups in the moderately difficult lesson 
is curious, although the difference between 
these two groups is not significant at p < 
-95.) 

The prediction that idiosyncratic study 
would prove superior to passive reading was 
not verified. Interfering with normal study 
habits could be expected to upset some stu- 
dents, and passive reading did produce the 
lowest treatment evaluation ratings; but the 
test-anxiety-acquisition and confidence- 
acquisition correlations only tended to be 
stronger for passive reading than idiosyn- 
cratic study. A possible explanation for the 
lack of significant effects is derived from an 
analysis of the usefulness of note taking or 
outlining in relationship to the purpose of 
studying and to the contents and organiza- 
tion of lesson materials. 

Frase (1969) found that students spon- 
taneously wrote notes when studying a text 
to perform a task which placed a large 
strain on immediate memory; however, it 
was the case here that only about 17% of 
the students took notes (as compared to 
questionnaire data which indicated that 
88% usually or sometimes write notes) , and 
it was observed from the lesson booklets 
that note writing typically extinguished by 
the third page. In addition to the fact that 
little note writing was done, another point 
is that Schultz and Di Vesta (1972) found 
that taking notes was initially advantageous 
when the text presented an unusual struc- 
ture (information organized according to 
attributes rather than names). Both on em- 
pirical and logical grounds, it would seem 
that control subjects should be permitted 
to study according to individual preference 
or habit. 

One potentially interesting result was that 
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two-week control group retention for the 
moderately difficult lesson was superior to 
control retention for the average lesson, | 
despite the fact that comprehension scores | 
and readability ratings for the average | 
lesson were higher. It may be inferred that | 
the moderately difficult readability of the 
moderately difficult lesson successfully 
stimulated careful reading. 

It should be recognized that the negative 
findings obtained here for knowledge-level | 
questions may not generalize to the use of | 
other kinds of questions. Some students 
complained that the questions were con- | 
cerned with details and suggested that they 
would have been more interesting and useful 
had they been aimed at the “main ideas.” 
The finding of Watts and Anderson (1971) 
that questions which required the applica- | 
tion of principles were more effective than 
questions which required recognition of pre- 
viously described examples supports this 
proposal. It is interesting to note that Watts 
and Anderson raised the possibility that 
their application questions were more effec- 
tive simply because they were measurably 
harder; however, results for the hard-ques- 
tion groups in this experiment imply that 
additional processing demands rather than 
difficulty per se explain their results. — 

To sum up, results for both immediate 
and two-week retention caution against in- 
serting low-level cognitive questions into 
text as a means for promoting the learning 
of information not specifically cued. Fur- 
thermore, there yet remains unanswered à 
most important practical question: How 
would students actually use inserted ques- 
tions when studying for school examina- 
tions? If the inserted questions were not well 
represented in their examinations, we may 
predict that student attention to them would 
extinguish. On the other hand, high test 
relevance would encourage careful attention 
to the questions (or surrogate instructional 
objectives) during study. Unfortunately, We 
might then also predict that students woul 
look ahead to find such valuable test clues 
when studying and thereby risk the loss 
of incidental learning found by Rothkop! 
(1966) and others for prequestion treat- 
ments. If this analysis is accurate, instruc- 
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tors wishing to employ adjunct aids would 
have to construct study questions that com- 
prehensively represent both instructional 
objectives and examination contents. 
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ADJUNCT QUESTIONS AND THE COMPREHENSION 
OF PROSE BY CHILDREN’ 


INGRID SWENSON ann RAYMOND W. KULHAVY* 
Arizona State University 


One hundred and nine fifth- and sixth-grade children ready twenty 
66-word paragraphs describing a fictitious island and its people. A criti- 
cal question on each paragraph was inserted before or after 1, 5, 
10, or 20 paragraphs. The retention measure consisted of items test- 
ing both critical and incidental material, in both verbatim and lexical 
paraphrase form. Learners responded to both immediate and one-week 
retention tests. Postpresentation of questions facilitated learning, and 
retention loss was greatest for 1-paragraph learners. Critical items 
were better recalled, and there was no effect for the verbatim- 


lexical paraphrase variable. 


Research on adjunct questions with adult 
learners has shown that it is often possible 
to control what a person learns from read- 
ing text. The present study attempts to rep- 
licate and extend these findings with a 
grade school population. 

Numerous studies have indicated that 
postquestions facilitate test performance 
more than prequestions (Frase, 1967, 1968a, 
1968b; Rothkopf & Bisbicos, 1967). Inter- 
rogatives placed before passages tend to limit 
inspection to specific (critical) content and 
to focus attention on the relatively small 
number of words needed to answer the 
question. Conversely, questions that appear 
after passages reduce specific discrimina- 
tions and promote the acquisition of nonref- 
erent (incidental) material as well. Al- 
though these results have occurred consist- 
ently with adult populations, we were una- 
ble to find any data gathered on school 
children. Younger learners may react dif- 
ferently to inspection control devices, possi- 
bly because of an attenuated attention span 
or a lack of strongly formed reading behav- 
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Raymond W. Kulhavy, Department of Educational 
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iors. Consequently, one purpose of this 
study was to replicate the effects of pre- 
and postquestions on critical and incidental 
recall with grade school subjects. 

Several studies have also varied the 
amount of material the subject reads before 
encountering a testlike event. What effects 
result when adjunct questions are paced at 
various intervals is unclear. Shorter text 
blocks favor the groups that see questions | 
after study, while longer segments benefit 
learners who receive interrogatives prior to 
inspection (Frase, 1967, 1968b, 1968c). In 
addition, effects of question pacing interact 
with the type of material. For critical ma- 
terial, medium-length passages yield opti- 
mum recall Interestingly, scores on inci- 
dental material improve gradually as more | 
information is presented (Frase, 1967). 
Reading longer passages may be the opti- 
mum procedure when testlike events do not 
occur during study (Frase, 1967). A second 
objective of this research is to determine ! 
these parameters are consistent using the | 
same text presentation with children. j 

According to Anderson (1972), there 18 
evidence that students process and store 1- 
structional material in at least two way® 
The first strategy, phonological encodings 
refers to verbatim storage of printed verba 
stimuli. Here a student learns by “rote, ? 
string of words which is meaningless to him 
but which he recalls intact during testing 
The second type of processing is sema 
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encoding which requires that the learner re- 
member the meaning as well as the physical 
features of the text. In this case, the student 
“eomprehends” the material and can iden- 
tify instructional statements correctly when 
they are presented in a form substantively 
different from the one initially learned. One 
means of determining if semantic encoding 
has taken place is to test recall with lexical 
paraphrases of the material originally 
learned. A paraphrase is defined as a 
parallel statement containing the same 
semantic content in a different substantive 
form. For example, “the monarch seemed 
mad at the instructor” might be transformed 
to “the king appeared angry with the 
teacher.” These representations differ sub- 
stantively but were judged identical in 
meaning by almost 90% of a high school 
sample. 

It seems possible that effects of question 
placement result from differences in the 
type of coding produced. When questions 
are placed before reading, they may pro- 
vide the learner with a phonological “tar- 
get” on which to expend his search activi- 
ties. On the other hand, when questions fol- 
low reading, the subject may attempt to 
store general context and, hence, remember 
semantic structure rather than specific ver- 
bal units (Sachs, 1967). A third purpose of 
the present study, then, was to determine if 
pre- and postquestion effects are due to dif- 
ferences in the type of coding they produce. 
. The final variable investigated was reten- 
tion. If recommendations from adjunct- 
question research are to be instructionally 
Useful, it is necessary to assess their effects 
Over time. It may be that placement, pac- 
ing, and encoding parameters may change 
appreciably due to factors effecting tenure 
in long-term storage. 


METHOD 


Design and Subjects 


ü The design was a 2 (Before-After) X 4 (1, 5, 10, 
5 20 Paragraphs) X 2 (Critical-Incidental Item) 
5 2 (Verbatim-Paraphrase Form) X 2 (Immediate- 

elayed Test) with repeated measures on the 
critical-incidental item, verbatim-paraphrase form, 
d immediate-delayed test variables. In addition 
9 the eight experimental groups, two control 
ee were included. One control (Control 1) read 

e text without questions, and the other (Control 
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2) answered the questions without reading the 
passage. 

] The subjects were 128 students from two fifth- 
sixth- and one sixth-grade classrooms. Participation 
was dependent upon class attendance during the 
two days that the experiment was conducted, Nine- 
teen subjects were dropped from the analysis for 
failure to complete all experimental tasks. 


Materials 


A 1,320-word passage titled “The Island of Ako 
and Its People” constituted the experimental text. 
The passage was divided into 20 paragraphs, each 
exactly 66 words in length. For each paragraph, 
two unrelated questions were constructed. Next, a 
lexical paraphrase was generated for each of the 
original questions, thus equaling four questions per 
paragraph. Each question and its paraphrase had 
no substantive words in common other than articles 
and the terminal response. An example of one set 
of questions for a paragraph is 

The islanders construct clothes from palm 

leaves. 

Things the natives wear are made with palm 

leaves. 
Since making a publicly observable response in- 
creases attention to the material studied (Kulhavy 
& Parsons, 1972), the subjects were required to fill 
in the terminal blank for each item in the text. 

Both the paragraphs and the pairs of questions 
were normed for readability and semantic similar- 
ity, using 127 fifth- and sixth-grade students from 
the participating district. The norming subjects 
were familiar with all the text vocabulary except 
the words coined for places, animals, and plants 
on the island. The median similarity rating for 
one set of question pairs was 4.18, and for the sec- 
ond set it was 4.26 on a 5-point scale, with 5 
equaling highest similarity. 

One of the four possible questions for each 
paragraph was chosen as the experimental item (to 
be seen by the subject while reading). The verba- 
tim question, its paraphrase, and the remaining 
set of questions were included on the criterion test 
to measure both the encoding strategy used and the 
amount of incidental learning. The particular 
items designated as experimental were separately 
randomized for each booklet, with the restriction 
that all questions were chosen an equal number of 
times across conditions. t y 

In the pacing conditions, the subjects read either 
1, 5, 10, or 20 paragraphs in conjunction with the 
same number of associated experimental questions. 
In the before groups, the appropriate number of 
questions were presented before the paragraphs, 
and in the after groups, the questions were given 
when the subject had completed reading. Subjects 
were required to complete the blank at the end of 
each question encountered. The booklets used in 
the experiment contained only 1 paragraph or 
question on each page. 

The criterion test consisted of all 80 constructed 
items. Form A of the test contained one form of 
each item and Form B the alternate form. All 
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. TABLE1 
Means AND STANDARD DEVIATIONS FOR 
Criterion Test Correct RESPONSES 
FOR QUESTION PLACEMENT, 
Question Pacine, AND Test Time 


Experimental Control 
Before After 

Group (n = 47) (075) | Read es- 
am nly 
Critical | Incl: critican neis |" ~ | (w= 5) 

Immediate 
x 16.28] 9.62| 17.80/13.09] 9.36 | 7.91 
SD 8.30) 6.69} 9.05) 7.10) 4.52 | 5.38 

Delay 

X 14.08. 9.21) 15.02/11.20| 7.07 | 3.88 
SD 8.09) 6.22} 8.61| 6.89| 4.96 | 3.11 


subjects received both forms of the test, and the 
item order was separately randomized for both 
the immediate and delayed measures. 


Procedures 


Subjects participated in two groups (ns — 60 
and 68) with subjects from all conditions present 
at each session. Text booklets were distributed ran- 
domly, with the restriction that about the same 
number of booklets from each treatment appeared 
at each session. Control group materials were 
handed out at a lower ratio to conserve n in the 
experimental conditions. 
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Number of Paragraphs Associated 
with Questions 
FrovRE 1. Mean correct recalls for pacing condi- 
tions on both the immediate and delayed posttests. 
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Each booklet contained a sheet of general in- 
structions directing the learner to read carefully 
and not to refer back to material previously read, 
As soon as all the subjects signified that they un- 
derstood the task, they were instructed to begin 


working through the booklet. Immediately upon | 


completing the text, the subject raised his hand 


| 


and a monitor collected his booklet, recorded his | 


reading time, and gave him the Form A of the 
criterion test. When the learner had completed this 
form, it was collected and he received the Form B, 

One week following the experimental session, 
the two forms of the criterion test were again 
administered to all participants in the same man- 
ner. 


RxsurTS 


Table 1 contains the means and standard 
deviations on posttest correct responses for 
the experimental and control groups col- 
lapsed across the paragraph variable. A 2 
(Before-After) x 4 (1, 5, 10, or 20 Para- 
graphs) x 2 (Critieal-Incidental Item) X 
2 (Immediate-Delayed Test) unweighted 
means analysis of variance on this data 
yielded significant results for the before- 
after (F = 4.65, df = 1/90, p < .05), criti- 
cal-incidental item (F = 10.69, df = 1/90, 
P < .01), and immediate-delayed test (F = 
76.19, df = 1/90, p < .01) main effects and 
the 1, 5, 10, or 20 Paragraphs x Immedi- 
ate-Delayed Test interaction (F = 4.66, dj 
= 3/90, p < .01). No other terms in this 
analysis reached significance. An analysis 
of simple effects for the 1, 5, 10, or 20 Para- 
graphs x Immediate-Delayed Test interac- 
tion yielded significance between immediate 
and delayed measures only for learners who 
had questions associated with individual 
paragraphs (F = 4.67, df = 3/90, p < 01). 
Figure 1 shows the form of the 1, 5, 10, or 
20 Paragraphs x Immediate-Delayed Test 
interaction. 

A 2 (Before-After) x 4 (1, 5, 10, or 20 
Paragraphs) x 2 (Immediate-Delaye 
Test) unweighted means analysis of vari- 
ance on the verbatim-paraphrase variable 
yielded no statistical significance. 

A 2 (Before-After) x 4 (1, 5, 10, or 20 
Paragraphs) analysis of reading times Was 
also computed. Again, none of the terms 2 
this analysis were significant. Clearly, post- 
test scores cannot be attributed to differen- 
tial study time. 


j 
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Discussion 


The superiority of groups receiving ques- 
tions after reading is consistent with avail- 
able data. However, contrary to previous 
research, postquestions failed to differen- 
tially facilitate learning of incidental items. 
The fact that critical items were learned 
consistently better, but did not interact 
with other variables indicates that storage 
for learners of this age is best served by 
specific cueing devices, It may be that chil- 
dren disregard incidental material when 
cues are available to them. This finding 
buttresses our contention that control of in- 
spection behaviors is a markedly different 
task with younger learners. 

The absence of main effects for the pac- 
ing variable is consonant with earlier stud- 
les. However, the Pacing X Placement rela- 
tionship found by Frase (1968b, 1968c) 
failed to occur. Again, this lack of agreement 
with our data may stem from our subjects’ 
inability to efficiently store as much informa- 
tion as more experienced learners. 

These data do not support our hypothesis 
that adjunct questions act on the type of 
coding in which the reader engages. With 
prose material, children seem able to pro- 
cess enough of the passage “meaning to al- 
low recall of verbatim or paraphrased items 
equally well. This is an important point for 
further research, since there are studies 
which suggest that semantic development is 
far from complete in elementary age chil- 
dren (Palermo & Molfese, 1972). 

The Pacing x Time of Test interaction is 
of interest. Over a retention interval there 
18 a trend for forgetting to decrease as a 
funetion of the number of paragraphs seen 
Prior to test items. However, the decrement 
's significant only for the subjects who had 
questions associated with one paragraph. 
Apparently, less information is lost over 
time when the text and associated questions 
L^ learned as larger units. Perhaps reading 
Onger segments before testing serves to as- 


Sist the subject in effectively storing more 


&eneral context from the passage. Here, 
a Ore intact presentation yields greater re- 
*ntion simply because the subject is able to 
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answer questions with reference to the total 
theme of the story rather than to its specific 
elements. Conversely, learners receiving the 
fragmented presentation may learn more 
specific information, but this type of mate- 
rial has a greater chance of being forgotten. 
If this reasoning is sound, the most efficient 
use of adjunct questions would be to place 
them in relation to larger blocks of text. 

This study suggests that adjunct ques- 
tions may yield different results as a func- 
tion of both the learner’s age and the 
amount of time that passes before the post- 
test is given. Future research might concen- 
trate on further assessing the degree to 
which learning involving adjunct devices is 
influenced by the age and language sophis- 
tication of the subject population. 
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LEARNING BY OBSERVING VERSUS LEARNING BY DOI 


DOUGLAS K. CHALMERS? 


University of California, Irvine 


Differences between performers and observers were investigated in a 
concept-transfer task. The method involved assessing the transfer due 
to reversal, nonreversal, and irrelevant or control shifts as a function of 
either performing or observing during an initial training task. The 
hypothesis that conceptual responses are readily learned by observa- 
tion was confirmed. When error rates of performers and observers were 
compared on the shift tasks, it was found that (a) performers, but not. 
observers, showed negative transfer on the reversal-shift task and 
(b) observers showed a smaller overall error rate than performers. 
These effects were consistent with the second hypothesis, which 
posited a relatively reduced degree of associative interference as a 
function of observational training experience. 


Experimental research on learning as a 
function of the observation of the behavior 
of other persons has emphasized the replica- 
tion of the specific responses the model has 
been observed to perform. Little attention 
has been given to the transfer or generaliza- 
tion of the observed experience to new situa- 
tions, despite the acknowledgment in dis- 
cussions of observational learning of the im- 
portance of this process in social develop- 
ment (cf. Aronfreed, 1969; Bandura, 1969; 
Rosenbaum & Arenson, 1968). 

The present study purports to examine 
some aspects of the differences between di- 
rect participation and observation in con- 
cept learning and transfer. For this kind of 
comparison to be made adequately, it is 
essential that the experimental design in- 
clude a Task 2 measure of the behavior of 
the Task 1 performer as well as that of the 
observer. Only a few studies (but none in- 
cluding concept-learning tasks) have in- 
cluded this type of. comparison (cf. Bruning, 
1965; Riopelle, 1960; Rosenbaum, 1967). 

The technique employed in the present 
study involves a comparison of the transfer 
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effects of reversal and nonreversal shifts 
a function of performing or observing i 
initial task. Unlike studies of inciden 
learning by observation (Bandura & [ 
ton, 1961; Berger, 1961), the observer 
initially instructed that he would later p 
form and that the observed task wou! 
relevant to his later performance. A rev 
shift typically involves learning two S 
cessive sorting tasks with correct respont 
for Task 2 being based on the same dim 
sion of the stimuli as on Task 1 but wit t 
subject being required to re-pair the pre 
ous sorting responses with the dimensi 
values. Nonreversal shifts require learni 
a second task based on some dimensio 
the stimuli which was present but irrele ra 
on Task 1. For example, if Task 1 requin 
sorting cards into certain categories on t 
basis of shape, a reversal shift would T 
quire a new sorting scheme also based | 
shape, whereas a nonreversal shift m 
involve one based on the color of the card 
Previous investigations with the conee} 
shift task have employed performers onl 
Experiments with college students as pel 
formers have consistently found that ™ 
versal shifts are learned more quickly thé 
nonreversal shifts (cf. Wolff, 1967). Kendll 
and Kendler (1968) interpret this effec D 
employing mediational constructs in 
stimulus-response framework. By assu 
that mediating conceptual responses 
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been acquired on Task 1, a reversal shift 
requires fewer new associations than does 
a nonreversal shift. In a study by Isaacs and 
Duncan (1962), it was found that a reversal 
group showed Task 2 learning inferior to 
that of a control group in which Task 1 
consisted of an entirely different set of 
stimulus dimensions from those relevant on 
Task 2. They reasoned that the subjects 
performing a reversal shift must extinguish 
Task 1 specific associations between stimuli 
and task responses before Task 2 learning 
can occur, whereas this was not the case for 
subjects in the control group. The present 
study also made use of this type of control 
condition, in which the specific effects of 
Task 1 were irrelevant to Task 2. 

In extending Miller and Dollard’s (1941) 
analysis of imitation to observational-learn- 
ing situations, some writers have empha- 
sized the role of covert imitative responses 
occurring during observation (Aronfreed, 
1969; May, 1946; Rosenbaum & Arenson, 
1968). Having learned in the past that 
matching the responses of others frequently 
leads to rewards, children acquire a gen- 
eralized tendency to imitate, which may be 
expressed without the necessity of overt 
matching behavior. Learning will occur as 
a result of observation when implicit or 
covert matching responses come to be elic- 
ited by the appropriate situational stimuli 
alone. The covertly learned responses can 
then be utilized in later contexts. 

It is assumed that both performers and 
observers covertly respond to the relevant 
stimuli during learning. Furthermore, the 
effect of the performer’s overt response is 
Seen simply as adding another element to 
his total response. It is held, therefore, that 
While the total response of the performer 
Consists of both covert and overt compo- 
hents, the response of the observer consists 
of the covert component alone. 

In providing a test of transfer of both 
Specific stimulus-response associations and 
implicit conceptual responses, the reversal- 
Nonreversal technique affords distinct ad- 
Vantages for studying observational learn- 
ng. Conceptual responses are, of course, 
Covert for both performers and observers. 

n the assumption that covert responses 
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are learned as readily by observers as by 
performers, it is predicted that the usual 
finding of superiority of a reversal over a 
nonreversal shift will hold equally well for 
observers. 

In regard to specific associations, on the 
other hand, whereas observers are assumed 
to have acquired covert associations alone, 
performers have acquired both overt and 
covert associative components. It is further 
predicted, therefore, that observers will un- 
dergo less specific interference from their 
initial (Task 1) associations than will per- 
formers on the shift tasks. 


METHOD 


Materials 


Materials were identical to those employed by 
Walther (1962). The stimuli were slides showing 
figures varying in three dimensions. For reversal 
and nonreversal groups on Task 1 and for all sub- 
jects on Task 2, the dimensions were shape (square, 
circle, diamond, and cross), number of figures (1, 
2, 3, and 4), and location of figures (top, bottom, 
right, and left). All figures were black on a white 
background. For the control groups on Task 1, the 
figures, all of which were square, varied in color 
(red, green, blue, and yellow), size (small, medium- 
small, medium-large, and large), and location (top, 
bottom, right, and left). 

In order to eliminate continued reinforcement, 
of Task 1 responses on Task 2 for the nonreversal 
groups, certain combinations of shape and number 
never appeared together. These were square and 
1, circle and 2, diamond and 3, and cross and 4. 
Due to the exclusion of these combinations, at the 
beginning of Task 2 there was no associative 
strength between the values of the relevant stimu- 
lus dimension and the correct responses for either 
the reversal or the nonreversal groups. After 
eliminating these combinations, a total of 48 
stimuli remained to be presented to these groups 
in Task 1. 

To equate the number of stimuli on Task 1 for 
reversal, nonreversal, and control groups, certain 
combinations of color and size were eliminated 
from the Task 1 stimuli presented to the control 
groups. These were red and small, green and 
medium-small, yellow and medium-large, and blue 
and large. In all conditions, the four locations 
occurred equally often with each value of shape, 
number, and color. 

The stimuli, having been photographed on 
16-millimeter film, were mounted on 2 X 2 inch, 
16-millimeter slide mounts for projection by a 
Kodak Carousel slide projector. For all sets of 
Task 1 stimuli, two sequences of the 48 slides 
were made, such that each slide appeared once in 
each sequence and assignment to position in the 
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sequence was based on a table of random numbers. 
The only restriction on the random order was 
that no value of any dimension should immediately 
follow itself. Since the slide drum for the Carousel 
projector holds a maximum of 80 slides, the last 
16 slides in the series were eliminated. This left a 
continuous series of 80 slides for Task 1. 


Apparatus 


Totally separated by a partition of black ply- 
wood, the two subjects (in the observe condition) 
were seated in such a way that the confederate was 
invariably to the left side of the naive subject. The 
subjects faced a milk-glass screen (36.5 centimeters 
high and 42 centimeters wide), mounted on a 
fiberboard partition which was painted flat black 
and placed on a table. This partition served to 
shield the experimenter from the view of the 
subjects. The screen was at eye level and equidis- 
tant (1.1 meter) from each subject. In the perform 
condition, only the naive subject was present, and 
he was always seated on the same side of the 
partition as the naive subjects in the observe 
condition. 

The slides, projected on the back of the milk- 
glass screen, were approximately 15 centimeters 
square. A green and a red jewel light were centrally 
located 7.5 centimeters and 15 centimeters, respec- 
tively, above the screen. 

Responses were four syllables selected from the 
Glaze list of 40% association value syllables: PEx, 
KUG, Mor, and wis (Hilgard, 1951). Each of the 
four nonsense syllables, printed in large black 
letters, was located at a different corner of a white 
8 X 11 inch pasteboard card. The card was taped 
to the fiberboard 1.25 centimeters directly below 
the milk-glass screen. This card was available 
throughout the session for reference. Response 
familiarization was given as part of the in- 
structions. 

On the appropriate side of the plywood partition 
separating the subjects, there was a printed letter, 
"A" for the confederate and "B" for the naive 
subject, The experimenter used these letters in 
referring to the subjects in the observe condition. 
Attached to the underside of the confederate’s 
chair, hidden from view, was a metal container 
which made available to the confederate his 
response card (a 5 X 8 inch note card) at the 
beginning of each observe session. The particular 
Tesponse sequences on the cards were prearranged 
by the experimenter, as described below. 

The subject responded to the stimuli by saying 
aloud one of the four syllables. If the response 
was correct, the green light came on. If the re- 
sponse was incorrect, the red light came on. In 
either case, the light appeared as soon after the 
subject’s*response as the experimenter could press 
his signal switch and remained on for approxi- 
mately three seconds. Immediately after the signal 


hake went off, the experimenter presented the next 
side, 
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Design 


To increase the precision and efficiency of 
experiment, a confederate (an advanced w 
graduate psychology major) was  emplg 
throughout the observe condition of the ej 
ment. This procedure served a threefold pul 
Most importantly, by employing a confede 
the Task 1 performer in the observe conditi 
was possible to exactly match each Task 
formance in the observe group with a 
performance in the perform group. Thus ij 
insured that a comparison of Task 2 performa 
between perform and observe groups would 
independent of both the number of trials to Ta 
criterion and the particular learning sequene 
volved in the attainment of the Task 1 cone 
Second, the addition of a confederate in the 
serve condition made certain that each of the T 
1 responses was quite clear and audible fo 
observer's reception. Third, the fact that the 8l 
person (the confederate) always served as Ti 
performer in the observe group insured a const 
stimulus situation for this group. 

In the perform group on Task 1, an e 
number of subjects learned a shape, number, 
color concept to a criterion of 10 consecutive d 
rect responses. On Task 2, half of the subject 
each group learned a shape concept and hal 
number concept. Those who learned a shape Qi 
cept on Task 1 and a shape concept on Ta 
with re-pairing of concept instances and respon 
on the second task, constituted a reversal i 
group, together with those who received numb 
number training. Shape-number and number-shi 
subjects constituted a nonreversal-shift group 
color-shape and color-number subjects constitu 
a control-shift group. 

Exactly the same design applies to the obser 
group, except that the confederate performed 
1, while the naive subject simply observed L 
partner's task performance. At the completio 
Task 1, the Task 1 observer then performed Task 
while the confederate remained silent. Naive 8U 
jects were led to believe that the confederate W 
also naive. Each Task 1 performance of the Q0 
federate was a reproduction of the Task 1 P 
formance of a subject in the perform group, ™ 
providing past experience for each observer equ 
alent to that of a performer in the perform gro 

On Task 2, half of the subjects in each of the 
conditions learned a number concept and © 
learned a shape concept. When the number ¢0 
cept was relevant, the subject was to learn to i 
PEX to the presentation of any three figures, regah 
less of their shape or location, wis to four nguy 
KUG to one figure, and mor to two figures. a 
correct pairings of the shape-stimulus values af 
nonsense syllables on Task 2 were diamond™ 
cross-WIJ, square-KUG, and circle-MoF. 

The nonreversal group consisted of two 
groups determined by which concept (number? 
shape) was learned first. One subgroup (num? 
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shape) learned the number concept first, then the 
shape concept in Task 2. The other nonreversal 
subgroup received the shape-number order. The 
control group was presented one color concept 
as Task 1 and received either number or shape as 
Task 2. 

In order to increase the generality of the find- 
ings, three different combinations of values and 
responses were used in the reversal conditions on 
Task 1 for both shape (Sı, Ss, Ss) and number 
(N, No, Na). This was done to anticipate the 
possibility that some within-dimension shifts might 
be easier than others. Thus the reversal condition 
consisted of equally allocated Task 1 to Task 2 
sequences SeS, Ss-S and N-N, Ne-N, N-N. 
Although not all of the reversal shifts employed 
involved “literal” reversals (shifts that are made 
within pairs of Task 1 associations), the authors 
follow Harrow and Buchwald (1962) in maintain- 
ing the “reversal” terminology for any shift that is 
made on the basis of the same dimension as was 
relevant on Task 1. Harrow and Buchwald, finding 
no differences between literal and nonliteral 
reversal shifts, argue that a sufficient condition for 
a reversal shift is the common relevance of the 
same dimensional values on both tasks. 


Procedure 


During the three weeks of testing, perform 
sessions were run on the first half and observe ses- 
sions on the second half of each week in roughly 
equal numbers over shift conditions. This alterna- 
tion procedure was necessary in order to obtain 
groups matched on Task 1 performance. Aside 
from this consideration, the subjects were assigned 
at random to the reversal, nonreversal, or control 
conditions. Upon arrival for testing, the subjects 
were seated and read the instructions. The experi- 
menter answered any questions that had to do with 
procedure but gave no information concerning the 
nature of the figures or the classes into which they 
were to be divided. 

Tn the perform condition, if the subject reached 
the Task 1 criterion of 10 consecutive correct 
responses within 100 trials, exclusive of criterion 
trials, he was shifted to Task 2. For the subjects 
in the observe condition, it was necessary to pause 
long enough to notify the observer that he was now 
lo perform. Therefore, a 30-second interval was 
introduced between Tasks 1 and 2 in both perform 
And observe groups. In an attempt to reduce the 
possible confounding effect of differential sets, the 
subjects in both groups were informed during the 
initial instructions and at the completion of Task 1 
that the purpose of the pause was to “change the 
slide tray.” The subjects continued on Task 2 
until they made 10 consecutive correct responses. 
Any subject who did not reach this criterion within 
100 trials, exclusive of criterion trials, was given 
a score of 100 and the trials were terminated. — 

Each Task 1 performer started at the beginning 
of the sequence of 80 slides. Task 2 was initiated 
just after that point in a sequence corresponding to 
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the number of slides used in Task 1. The subjects 
matched in the perform and observe conditions 
were given identical slide sequences. 

In an attempt to discover those subjects in the 
observe condition who did not learn Task 1 and to 
compare performer-observer retention, & brief 
written questionnaire consisting of two questions 
was administéred to all subjects immediately fol- 
lowing completion of Task 2. One question was 
phrased to elicit recall of the Task 1 concept and 
the second to test recall of the correct Task 1 
pairings among the values of the relevant dimen- 
sion and the nonsense syllables. 


Subjects 


Subjects were 100 male students from the 
undergraduate introductory psychology course at 
the University of Iowa who were given course 
credit for participation. Any subject who did not 
learn Task 1 to the required criterion within 100 
trials, exclusive of criterion trials, was dropped 
from the experiment. Four subjects were elimi- 
nated for this reason. Two of these subjects were 
to have learned a color concept, one à number 
concept, and one a shape concept. 

‘All subjects in the control group were admin- 
istered the Dvorine pseudo-isochromatie plates 
(color vision test) at the end of the experiment. 
Any subject who failed to respond correctly on 
more than three plates was eliminated from the 
study. Four subjects were dropped for this reason. 


RESULTS 


Task 1 

The mean number of trials required to 
learn Task 1 for subjects who were to be 
exposed to the various shift conditions, 
exclusive of criterion trials, was 40.0, 32.4, 
and 38.2 for the reversal, nonreversal, and 
control groups, respectively. The mean num- 
ber of trials for training in Task 1 on the 
various concept dimensions was 32.5, 39.9, 
and 38.2 for the shape, number, and color 
concepts, respectively. Separate analyses of 
variance revealed no differences among the 
means of either analysis. Both F ratios were 


of a magnitude smaller than 1. 


Task 2: Trials to Criterion 


Seven subjects failed to learn Task 2 to 
the required criterion within 100 trials. 
These subjects were given a score of 100, and 
Task 2 was terminated. Four of these sub- 
jects served in the observe condition and 
three in the perform condition. Of the ob- 
servers, three were in the nonreversal condi- 
tion and one in the reversal condition. Of the 
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TABLE 1 
Mean NuMBER or Tniars To Learn Task 2 
Task 1 experience 
Type of shift Perform Observe | omy 
M SD | M sp | M 
Reversal 21.1 | 13.7 | 29.8 | 23.6 | 25.4 
Nonreversal | 48.3 | 26.9 | 53.3 | 29.1 | 50.8 
Trrelevant 19.3 | 24.5 | 24.8 | 18.2 | 22.1 
Combined | 29.6 35.9 


performers, two were in the nonreversal 
condition and one in the control condition. 
Table 1 presents the mean number of trials 
required to learn Task 2, exclusive of crite- 
rion trials, and the standard deviations for 
each subgroup. Because inspection of these 
data revealed heterogeneity of variance 
and some departure from normality, a more 
conservative level of significance was 
adopted (p < .025). 

Although observers required a slightly 
greater number of trials to criterion than 
performers in all shift conditions, there was 
no reliable difference between these two con- 
ditions (F = 2.0, df = 1/102). Type of shift 
was significant (F = 17.8, df = 2/102, p < 
01). The critical difference (i.e., the mini- 
mum difference required for significance; 
Lindquist, 1953), computed for the row 
means with a value of ¢ significant at the 
025 level, was 11.8. A difference of this 
size occurred in the direction of a greater 
number of trials to criterion for the non- 
reversal group than for the reversal and 
control groups, which did not differ from 
each other. The finding of no interaction be- 
tween Task 1 experience and type of shift 
(F < 10, df = 2/102) indicated that the 
differences due to type of shift were of the 


same magnitude for both performers and 
observers. 


Errors 


To reveal some of the details of Task 2 
performance, it is of value to employ a mea- 
sure that assesses the relative occurrence of 
errors on the shift tasks independently of 
the total number of trials to criterion. A 
measure suitable for this purpose was the 
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ratio of errors to the total number 
sponses for each subject. 

The mean ratio of errors to total 
(excluding criterion trials) for all con 
is presented in Table 2. An anal 
variance of these data revealed main 
due to type of shift and Task 1 
(F = 17.7, df = 2/102, p < 01 and F 
df = 1/102, p < .01, respectively). 
means showed a lower mean error rati 
observers; row means showed the 
mean error ratio for subjects in the ¢ 
condition. However, the additional fi 
of significant interaction effects (F 27 
df = 2/102, p < .01) qualifies a direct it 
pretation of the main effects. The eriti 
difference of .079 (computed for p = 0 
applicable to the six cells of the table, ¥ 
rants the following statements. Perf 
obtained a significantly smaller error 
in the control condition than in the re 
and nonreversal conditions. Observi 
the other hand, produced a smaller. 
ratio in both the reversal and control 
tions than in the nonreversal cond 
Perhaps of greatest interest is the fil 
that observers produced a signi 
smaller error ratio than performers 
reversal condition but did not differ reli 
from performers in the nonreversal 
trol conditions. Each mean of T 
could, of course, be converted into & 
proportion of correct responses by 
tracting the error ratio from 1. 


Trials to successive criteria 


On the basis of the error ratio 
used above, two events occurred of 
interest that were not accounted 
terms of the trials-to-criterion mea 
reversal shift brought about a greater 


TABLE 2 
Mean Ratio or Errors ro Torat R 


‘Type of shift 


Reversal .693 .530 

Nonreversal .698 .654 

Irrelevant .517 .542 
Combined .636 . 
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.Fraung 1, Mean number of trials to successive 
criteria during Task 2. 


portion of errors for Task 1 performers than 
for Task 1 observers. A reversal shift in the 
perform condition also broüght about a 
greater proportion of errors than did the 
control shift. These findings suggest that al- 
though these groups did not differ with 
respect to the attainment of a particular 
criterion, differences did indeed oceur during 
the process of reaching that criterion. 
Figure 1 shows the number of trials to 
criterion (exclusive of criterion trials) as a 
function of criterion difficulty. The eriteria 
employed along the abscissa are each in 
terms of number of consecutive correct re- 
sponses. To plot these curves jt was neces- 
sary to eliminate the seven subjects who 
failed to reach the Task 2 criterion of 10 
consecutive correct responses. Since the pur- 
pose of the curves in Figure 1 is to describe 
concept acquisition, the inclusion of these 
subjects would be inappropriate. In addition, 
since the sample of subjects in the perform 
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condition was restricted to only those sub- 
jects who were ableto learn Task 1, elimina- 
tion of all nonlearners from Task 2 provided 
a comparable screening device for observers. 

; Due to the nature of the trials-to-succes- 
sive-criteria measure and to the lack of cell 
proportionality, no statistical analyses were 
performed on the data of Figure 1. Further- 
more, these data are intended merely as a 
descriptive commentary on the error rate 
data discussed above. The curves related to 
error rate as follows: (a) a higher average 
slope reflects a greater number of errors 
between the attainment of successive crite- 
ria; (b) positive acceleration is produced 
either by an increase or by no change in the 
number of errors between criteria; (c) both 
linearity and negative acceleration are 
brought about by a reduction in the number 
of errors between criteria; and (d) leveling 
of the curves occurs where the error rate is 
reduced to zero (e.g. à jump from 6 con- 
secutive correct responses to 10 consecutive 
correct) . 

The most apparent observation from 
Figure 1 is the relatively high acquisition 
slopes for the nonreversal groups, indicating 
much slower progress toward successive cri- 
teria throughout. In addition, a higher cri- 
terion is required for nonreversal subjects 
before any appreciable error reduction 
(negative acceleration) takes place. 

The finding of negative transfer for the 
error rate measure for reversal performers 
with respect to the control performers ap- 
pears to be consistently the case at all cri- 
terion levels of Figure 1 (solid lines in 
lower portion of figure). The associated 
finding of no error rate difference for re- 
versal and control observers is shown by the 
broken lines of Figure 1, where observers 
exhibit either positive or essentially zero 
transfer throughout. 

Finally, it is of interest to compare per- 
former-observer differences in reversal-shift 
performance. The reliable finding of a 
greater error rate for performers than ob- 
servers is reflected in the open-dot curves 
of Figure 1 by two observations: (a) a 
greater absolute number of errors for per- 
formers than observers at the lower criteria 


and (b) a faster rate of error reduction be- 
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TABLE 3 
NUMBER or SUBJECTS SHOWING RECALL OF THE 
RxLEvANT Task 1 DIMENSION AND OF ALL 
THE Task 1 SriMULUS-RESPONSE 
ASSOCIATIONS 


"Task 1 experience 


Type of shift Perform Observe 
Dimen- Associ- | Dimen- Associ- 
sion ation sion ation 
Reversal 16 10 16 10 
Nonreversal 18 8 17 11 
Irrelevant 18 12 13 8 


Note. n = 18 per group. 


tween criteria for performers than for ob- 
servers. Observers appear then to have the 
edge on performers, in terms of fewer errors, 
only in the beginning phases of reversal 
transfer. 

In comparing performers with observers, 
the curves strongly suggest divergence in the 
control condition (solid dots of Figure 1). 
As the control shift was irrelevant to Task 2 
requirements, this divergence is likely due to 
performer-observer differences in the trans- 
fer of nonspecific factors. That is, the posi- 
tive advantage for performers over ob- 
servers of such factors as visual-vocal 
coordination and temporal adjustment ap- 
pears to increase as the difficulty of the 
transfer criteria increases. 


Recall 


Following the shift task, the subjects were 
administered a questionnaire in order to 
test recall of Task 1 training. The first item, 
which stipulated that the subject was to 
“explain in general terms how the slides 
were classified . . . before the pause,” tested 
recall of the Task 1 relevant dimension. Re- 
call of the specific Task 1 stimulus-response 
pairings was tested by asking the subject to 
“try to recall which nonsense words were 
associated with which slide categories before 
the pause.” 

Table 3 shows the number of subjects 
(n = 18 per group) in all 12 conditions who 
showed correct recall of the Task 1 rele- 
vant dimension and of the Task 1 associa- 
tions. For performers, recall is uniform 
across type of interpolated shift. Dimen- 
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sion recall is close to 100% ; in contrast, only | 
about 50% of the subjects showed perfect 
association recall. 

Observers, who merely observed Task 1, 


showed essentially no differences from per- | 


formers in absolute recall of either the Task 
1 relevant dimension or associations, under 
conditions of reversal- or nonreversal-shift 
interpolation. 


Discussion 


Direct training 


The finding of a clear superiority of re- | 
versal over nonreversal shifts agrees with | 


previous studies, which have used Task 1 
performers only, and supports a mediation 
theory of concept transfer (Kendler & 
Kendler, 1968). In particular, this result 
conforms to previous findings of studies 
employing similar four-category reversal 
and nonreversal shifts (Harrow & Buch- 
wald, 1962; Walther, 1962). " 

As noted earlier, Isaacs and Duncan 
(1962) demonstrated negative transfer for 
both reversal and nonreversal subjects, rela- 
tive to a control group given irrelevant Task 
1 training. In terms of the trials-to-criterion 
measure, the present study revealed nega- 
tive transfer for the nonreversal group only, 
as did the Walther (1962) study. However, 
when a more discriminating error measure 
was employed for performers’ data, negative 
transfer was indicated for reversal subjects, 
as well as for nonreversal subjects. On a 
specific stimulus-overt response association 
level, then, reversal and nonreversal per- 
formers show negative transfer of Task 1 
associations, as expected from interference 
theory in verbal learning (Postman, 1961). 
Such low-level interference occurs despite 
the large concept-transfer effect, as indi- 
cated by the reversal-nonreversal differ- 
ences. 

Finally, the reversal-nonreversal concept- 
transfer effect must be attributed to neg? 
tive transfer of the nonreversal group; s 
opposed to positive transfer of the revers 
group. That is, since transfer for rever? 
subjects was either zero (for trials to erite- 
rion) or negative (for error rate) with re- 
spect to the control group, the inferior pe™ 
formance of nonreversal subjects must 


j 
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attributed to the concept-shift difficulty of 
that condition rather than to any facilita- 
tion due to reversal shift. 

It is interesting to note that if the ob- 
server control group is regarded as a “con- 
trol" for the performer control group, the 
factors responsible for nonspecific transfer 
may be further broken down. The lack of 
a significant (Task 2) difference between 
these two control groups (19.3 trials to eri- 
terion for performers and 24.8 trials to crite- 
rion for observers), in the face of original 
Task 1 performance for the comparable 
shape and number (performer) groups 
(39.6 trials to criterion), strongly indicates 
that “performance set” itself is a very weak 
component of the learning-to-learn, nonspe- 
cific transfer effect. 


Observational training 


The finding that observation of Task 1 
leads to superiority of reversal over non- 
reversal shifts supports the hypothesis that 
concept learning can occur by observation. 
A further finding of some interest is that 
Task 1 experience did not interact with type 
of shift on the final trials-to-criterion mea- 
sure. That is, the difference between reversal 
and nonreversal groups for observers was 
of the same magnitude as that for per- 
formers. Conceptual responses acquired by 
observation, then, appear to be no less po- 
tent than those acquired by direct perform- 
ance, 

Although observers and performers did 
not differ on the final trials-to-criterion 
measure, observers showed a reliably smaller 
overall error ratio than performers on the 
transfer task. In addition, their patterns of 
error rates differed. While performers, as 
noted above, showed negative transfer for 
both reversal and nonreversal shifts relative 
to their control-shift error rate, observers 
did not show this negative transfer in the 
reversal condition. Moreover, the smaller 
error rate for observers was primarily at- 
tributed to the reversal-shift condition. The 
Second prediction, then, that observers 
would undergo less associative interference, 
Was confirmed. 

To summarize, the error analysis revealed 
that relative to performers, observers dem- 
onstrated an immunity to perseveration of 
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Task 1 associations, especially at the begin- 
ning of transfer, where specific interference 
effects should be greatest. This evidence is 
viewed as support for the thesis that re- 
sponses made by performers are more 
resistant to extinction than are the covert 
counterparts of these responses made by 
observers. More precisely, whereas a per- 
former must extinguish both the covert and 
the overt components of his response, an 
observer need only extinguish a covert re- 
sponse. An alternate interpretation is that 
observers did not learn the Task 1 associa- 
tions as well as did the performers and, 
therefore, experienced less interference as a 
result. The finding that observers did not 
differ from performers in overall recall of 
these associations would not favor the latter 
interpretation. 

In conclusion, although concept attain- 
ment occurs quite as readily under observa- 
tional as under direct training conditions, 
observational training appears to provide a 
certain “flexibility” in the attained concept. 
This “flexibility” is reflected in a detach- 
ment of the concept from its specific learn- 
ing conditions. 
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ERRATUM 


In the article Effect of Anxiety, Response Mode, Subject Matter Famili- 
arity, and Program Length on Achievement in Computer-Assisted Learning" 
by Barbara L. Leherissey, Harold F. O'Neil, Jr., Darlene L. Heinrich, and 
Dunean N. Hansen which appeared in the June 1973, Volume 64 issue of the 
Journal of Educational Psychology, two of the four groups in Figure 6 were 
incorrectly labeled: The Constructed Response-Long group should have been 
labeled Construeted Response-Short and the Constructed Response-Short 
group should have been labeled Constructed Response-Long. In addition, the 
text that refers to Figure 6 should be changed to reflect this correction. These 
changes do not affect the remaining results and discussion sections. 
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EFFECT OF TYPE OF OBJECTIVE, LEVEL OF TEST QUESTIONS, 
AND THE JUDGED IMPORTANCE OF TESTED MATERIALS 


UPON POSTTEST PERFORMANCE 


ORPHA K. DUELL! 
Wichita State University 


The purpose was to discover under what conditions providing be- 
havioral objectives during study improves the amount learned. Col- 
lege seniors (n = 80 and n = 87) participated in two separate experi- 
ments. The findings support the hypothesis that providing students 
with behavioral objectives during study produces greater learning 
only if the objectives direct the student to learn information which 


he would not classify as important or likely to be tested. 


Some investigations which have studied 
the effects on learning of providing students 
with behavioral objectives as they study 
text materials have shown learning ad- 
vantages for students provided behavioral 
objectives (Morse & Tillman, 1972; Roth- 
kopf, 1972; Rothkopf & Kaplan, 1972; 
Webb & Cormier, 1972), while others do 
not (Jenkins & Deno, 1971; Merrill & 
Towle, 1972; Oswald & Fletcher, 1970; 
Yelon & Schmidt, 1971). Because of the 
mixture of results, it would seem that pro- 
viding behavioral objectives is advantage- 
ous under some circumstances but not all. A 
variable that might be a controlling factor 
is the test expectations which subjects 
bring to the learning situation. A student’s 
test expectations would influence what is 
learned by determining what parts of the 
text the student chooses to actively process 
and what he ignores. This assumes that as 
long as the teacher is not providing assign- 
ments, tests, and/or questions which force 
the active processing of materials, the stu- 
dent either consciously or unconsciously 
chooses what he will actively process. 

. Student expectations would logically be 
influenced by factors such as past experience 
With this teacher, this type of course, and 
this type of text, as well as what type of test 
the teacher says he will give and the type of 
test previous students say this teacher gives. 
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! Requests for reprints should be sent to Orpha 
Ui _Duell, College of Education, Wichita State 
a "hiversity, Wichita, Kansas 67208. 


One characteristic of the textual material 
which might affect test expectations and, 
therefore, the effectiveness of behavioral 
objectives is the agreement between the 
emphasis placed upon information within 
the text and what is included in the be- 
havioral objectives. For example, consider a 
passage that introduces four concepts, one is 
expanded upon through examples and de- 
scriptions, while the others are barely men- 
tioned in passing. If the behavioral objec- 
tive concerns the concept with which the 
majority of the passage deals, the behavioral 
objective tells the student nothing more than 
is already indicated by the reading passage; 
however, if the behavioral objective includes 
the major concept plus one of the other three 
concepts, the student may benefit from 
having the objective, since it provides 
guides that differ from the ones provided by 
the text. 

Yet another factor that may affect the 
learner's test expectations is his past ex- 
perience with tests. Reviews of teacher- 
made tests indicate that most of the items 
(90 %-95 96) require rote recall of specific 
information (Thorndike & Hagen, 1969), 
despite the fact that most teachers would 

that “higher level" behaviors, as 
defined by Bloom (1956), are more impor- 
tant goals. This would lead to the predic- 
tion that students not provided with orient- 
ing directions such as behavioral objectives 
would probably expect to be given recall 
questions and would not expect higher level 
questions over text material. 
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EXPERIMENT 1 


The first study was designed to investi- 
gate the joint effects of the level of the test 
questions and the availability of behavioral 
objectives on learning from written prose. 
The prediction was that the difference be- 
tween the behavioral objective and non- 
behavioral objective groups would be non- 
significant for recall questions but significant 
for application questions. Each reading 
passage was designed to introduce three 
related concepts. All concepts were defined 
and expanded by at least one example. 


Method 
Subjects 


Fifty-six college seniors enrolled in two sections 
of an educational psychology course taken just 
before student teaching were randomly assigned 
to the two experimental conditions so that ap- 
proximately half of each section was in each of the 
experimental conditions. Each experimental con- 
dition contained 28 subjects. An additional group 
of 24 students in another section of the same course 
acted as a control group. 

Materials 

Passages. Five passages were written describ- 
ing the processes of prompting, shaping, using 

negative reinforcement, using advanced or- 
ganizers, and measuring test-retest reliability. 
These topics were chosen because they were rele- 
vant to the course in which the students were 
enrolled. 

Each passage contained approximately 410 
words and all passages were similar in organiza- 
tion. The first paragraph contained a descriptive 
example of the process. The second paragraph 
introduced the term, its definition, and the names 
of two psychologists associated with the process. 
The names were chosen so that each name was used 
only once. The third paragraph consisted of a 
second descriptive example of the process. The 
fourth and final paragraph contained two related 
technical terms, their definitions, a descriptive 
example of each of them, and a summary sentence. 
Each time a technical term was introduced for the 
first time it was underlined. 

Posttest. Multiple-choice items with four alter- 
natives were prepared from the content of each of 
the five reading passages. The test items and the 
letter of the correct choice were randomly ordered. 

Two levels of test questions were used. Recogni- 
tion questions were questions which required the 
student to recognize something that he had read 
in the passage. Three recognition questions were 
constructed for each reading passage. Two re- 
quired the student to choose definitions. The 
terms to be defined were the major term introduced 
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by the passage and one of the two related technie 
terms from the fourth paragraph. The relat 
term was randomly chosen so that for two of 
passages the first term introduced was tes 
while the second term was tested for the oth 
three. The third recognition question named ij 
process described in the passage and asked f| 
student to choose the psychologist's name 
ciated with it. Since two psychologists were name 
in each passage, the name to be tested was ral 
domly chosen so that the first name was tested ft 
three of the passages and the second name for they 
other two. 

Three application questions were prepared fa 
each reading passage. In each, the student choi 
the situation which was an example of the proce 
named in the question. All alternatives describ 
Situations different from those in the readi 
passages. All three questions dealt with the majo 
process introduced in the reading passage. 

Objectives. The behavioral objectives welt 
written in the style described by Mager (1902 
except they contained no minimal standard 
following is an example objective and a portiong 
the preceding directions: 


When you have finished studying the passages 
you will be given a 30-item multiple-choice test 
to determine if you can do the following activi: 
ties: 
1, Choose from among four example situations: 
(that are different from those described in the 
reading passages) the one that is an example) 
of a specified process. In each question thé 
specified process will be one of the followi 
advanced organizers 
negative reinforcement 
prompting 
shaping 
test-retest reliability. 
| 


Two additional objectives were included, one to 
the items on psychologist’s names and one for the | 
items requiring definitions. Their form was similar | 
to that used in the example. f 
Students given nonbehavioral objectives werė 
told, “you will be given a 30-item multiple-choit i 
test to determine if you know and understand t 
material in the reading passages." This non to 
havioral objective was thought to be simila 
what students are frequently told or is inferred by 
them when given reading assignments to 


Procedure 


ith 
Students came to class at the regular hour kr 
no prior knowledge of the experiment. They 
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given manila envelopes bearing their names. Each 
packet contained the reading passages and one of 
the two sets of directions. Both sets of directions 
told the student that he would be held responsible 
for the materials for the class in which he was en- 
rolled, that he could use as much of the two-hour 
class period as he needed, that he could make 
marks on the reading passages, and that he could 
refer back to the directions during study. These 
instructions were followed by either the behavioral 
or nonbehavioral objectives, depending on the 
condition to which the student was randomly as- 
signed. Within each condition, the five reading 
passages appeared in three different random 
orders. The students were asked to record the time 
on their green time sheets before they began the 
directions, after they finished the directions, and 
after they finished studying the reading passages. 
The time was written on the blackboard and it was 
changed every minute. 

When a student indicated that he was finished 
studying, the study materials were picked up and 
the student was given a manila envelope contain- 
ing a copy of the test. If the student was in the 
behavioral objective group, the objective use 
questionnaire was attached to the back of the test. 

The control group was given the test and told 
to do their best even though they had not read 
the passages the test, questions covered. 


Results 


Test Data 


Table 1 contains the means and standard 
deviations for the experimental groups and 
the control group subdivided by level of 
question. 

The experimental groups—behavioral and 
nonbehavioral objective groups combined 
(M = 23.55, SD. = 4.36)—did perform sig- 
nificantly better (t = 10.17, df = 78, p < 
01) than the control group (M = 13.58, 
SD = 3.86) on the posttest, indicating that 
learning had occurred in the experimental 
setting. 

Three orthogonal planned comparisons 
were completed according to the procedure 
outlined in Hays (1963) but with one excep- 
tion. Since the data involved repeated meas- 
ures, the mean-square interaction term from 
a two-way analysis of variance was used in 
calculating the ts instead of the mean square 
Within? The first factor in the 4 X 28 
analysis of variance consisted of the four 
different experimental conditions, while the 
second factor consisted of levels, where each 


E 


? This procedure was suggested by Gene v. 
lass, personal communication, January 20, 1971. 


227 


TABLE 1 


MEANS AND STANDARD DEVIATIONS FOR THE 
POSTTEST SCORES IN EXPERIMENT 1 


Group/question M SD 
Behavioral 

Recognition 13.07 2.11 

Application 11.07 3.02 
Nonbehavioral 

Recognition 11.07 2.02 

Application 11.89 2.17 
Control 

Recognition 6.58 2.10 

Application 7.00 2.30 


level was a pair of subjects, one from the 
nonbehavioral objective group randomly 
paired with one from the behavioral objec- 
tive group. 

Contrary to the predictions, two-tailed 
tests revealed there was a nonsignificant 
difference between the behavioral and non- 
behavioral objective group on the applica- 
tion questions (¢ = 1.42, df = 81, p > .05) 
and a significant difference on the recall 
questions (t = 3.45, df = 81, p < 01), with 
the behavioral objective group performing 
significantly better than the nonbehavioral 
objective group as can be seen in Table 1. 
The difference between everyone on the 
recognition questions and everyone on the 
application questions was nonsignificant 
(t = 144, dí = 81, p > 05), although the 
difference was, as one would expect, in favor 
of the recognition questions. 


Objective Use Questionnaire Data 

Based upon their responses to the ques- 
tionnaire, students in the behavioral objec- 
tive group were classified as users or non- 
users of the objectives. Five nonusers (M = 
18.60) and 23 users (M = 25.17) were iden- 
tified. When the data were analyzed with 
the nonusers removed, exactly the same pat- 
tern of results emerged as when they were 
included. 


Time Data 

When the total amount of study time as 
measured in minutes was compared for the 
behavioral and nonbehavioral objective 
groups, students given the behavioral ob- _. 
jectives (M = 18.39, SD = 4.51) spent sig 
nificantly more time studying (t 0285, 
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df = 54, p < .01) than students given the 
nonbehavioral objective (M — 15.29, SD — 
3.59). However, when the amount of time 
the students spent reading the directions 
before reading the passages was subtracted 
from the total study time, the difference 
between the behavioral objective (M — 
16.04, SD — 4.23) and the nonbehavioral 
objective groups (M = 14.57, SD = 3.40) 
was nonsignificant (t = 1.42, df = 54, 
p > .05). 
Discussion 
It was thought that having behavioral 
objectives during study would be helpful to 
the student on questions which his past his- 
tory did not lead him to expect. Because 
most teacher-made tests are composed of 
items requiring recall of information, it was 
predicted that behavioral objectives would 
prove helpful on application questions but 
not on recognition items. Just the opposite 
was found in the experiment. Two aspects of 
the data led to the speculation that the im- 
portance of an idea, as judged by the stu- 
dents based upon the nature of the textual 
materials, was responsible for the results. 
The data showed that if the student could 
correctly identify the definition of a term, he 
was also relatively successful at answering 
the related application questions (63 % of the 
time all three of the related application 
questions were correctly answered; 22%, 
two of the three; 10%, one of the three; and 
5%, none of the three). If, however, the 
student was unable to correctly define the 
term, he was relatively unsuccessful at 
answering the related application questions 
(15% of the time all three of the related 
application questions were correctly an- 
swered ; 19%, two of the three; 25%, one of 
the three; and 41%, none of the three). This 
trend held for both the behavioral and non- 
behavioral objective groups. This suggests 
that the classification of an item of informa- 
tion as important enough to be tested was 
the important dimension rather than the 
anticipation of a type (recognition or appli- 
cation) of question. It should be noted that 
^since this breakdown of the data involves 
only one third of the recognition questions, 
is relationship does not suggest that 


a. 
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identical findings should result from 
recognition and application questions. Simi- 
lar results should be found when the recogni- 
tion questions test the same items as the 
application questions or when both are classi- 
fied as important enough to be tested, re- 
gardless of specific content. 

Further support for the notion that the 
judged importance of an item of information 
was the determining factor came from an 
inspection of the recognition data which 
indicated that the difference between the 
behavioral and nonbehavioral objective 
groups was created primarily by the one 
question from each passage requiring a psy- 
chologist's name as an answer. The differ- 
ence between the behavioral objective and 
nonbehavioral objective means of 2.00 raw 
score points when all recognition questions ; 
were considered was reduced to .32 raw score | 
points when the name questions were re 
moved. It appears that students given the 
nonbehavioral objective anticipated the defi- 
nition questions but did not anticipate the 
name questions. Contextual cues may ex- 
plain this. Because the study passage was 
entitled “Prompting” not ‘People Who 
Have Studied Prompting” and the majority 
of the text dealt with what prompting is and 
its two different types instead of telling about 
the psychologists, the student may have de- | 
cided that his task was to understand 
prompting. Knowing that "Anderson" is | 
somehow connected with prompting has 
very little to do with whether the process of 
prompting is understood; therefore, the 
student may have classified this item 0 
information as unimportant and hence Un 
likely to be tested. 


EXPERIMENT 2 


A second study was designed to test the 
hypothesis that the judged importance of an 
item of information determines whether 
knowledge of the behavioral objectives dur- 
ing training is helpful. Providing students 
with behavioral objectives during stu 
should improve posttest performance OM 
items students judge unimportant (not mess. 
uring how well they know and understar 
the tested material) but should not nf 
posttest performance on items studeni 
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judge important. A delayed posttest was 
included to discover whether these predicted 
differences would persist. 


Method 


Subjects 


College seniors in two sections of an educational 
psychology course were randomly assigned to the 
two experimental conditions so that approximately 
half of each section was in each condition. Of the 
32 subjects originally assigned to the behavioral 
objective condition, 2 were absent the day of the 
delayed posttest, leaving 30 subjects in the be- 
havioral objective group. Of the 34 subjects orig- 
inally assigned to the nonbehavioral objective 
group, 3 were absent the day the posttest was ad- 
ministered and 1 was randomly withdrawn to 
maintain equal ns, leaving 30 in the group. An 
additional group of 27 students in another section 
of the course was used as the control group. 


Materials 


Passages. Five passages were used.? Four were 
modified versions of those used in the first study. 
The fifth was on overlearning and replaced the 
passage on test-retest reliability which had a 
higher mean correct than the other four passages 
in the first study. 

TThis time no underlining was used and two dates 
were added to the second paragraph of each pass- 
age. The dates were chosen so that each appe: 
only once. The passages remained approximately 
410 words in length. 

Posttest. The test consisted of four-alternative 
multiple-choice items which were randomly or- 
dered. The letter of the correct choice was also 
randomly ordered. 

b As in the previous study, the test consisted of 
oth recognition and application questions. How- 

er this time two types of recognition items were 
; One set of three recognition items were prepared 
or each passage that would probably be classified 
me not measuring how well I grasped the ma- 
erial” (unimportant). These consisted of two 

Rd requiring psychologist's names and & 
i ird item requiring one of the two dates as 
AERA The tested dates were chosen so that for 
ee of the passages the date presented first in 


E passages, test, and questionnaires have 
Kon deposited with the National Auxiliary Pub- 
EAS Service. Order NAPS Document No. 
(I4 from ASIS/NAPS, % Microfiche Publica- 
10017 305 East 46th Street, New York, New York 
5 7. Remit with order for each NAPS document 
puter, $1.50 for microfiche or $5.90 for photo- 
iene pike checks payable to Microfiche Pub- 
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the reading was tested, while the second date was 
tested in the remaining two passages. 

A second set of three recognition items for each 
passage required the definitions of the three terms 
introduced by the passage. Students would prob- 
ably classify these as “measuring how well I 
grasped the material” (important). 

The remaining questions were three application 
questions from each reading passage. These ques- 
tions were in the same form as the application 
questions in the first study; all dealt with the 
major process introduced by the passage, and all 
included descriptions of situations different from 
those described in the reading passage. 

Where possible, test items from the previous 
study were used for the second study. Items were 
revised to make the incorrect alternatives more 
plausible when item analysis suggested such re- 
visions. 

Objectives. The behavioral objectives were the 
same as in the first study with one modification 
suggested by the students to make them easier to 
understand. Each objective began with the 
phrase, ‘On the test you will be given” and con- 
tinued as before. 

The nonbehavioral objective was the same as 
in the first experiment except now students were 
told the test was a 45-item test. 

Objective use questionnaire. The same question- 
naire was used as in the first experiment to identify 
students who did not use the behavioral objec- 
tives. 

Importance questionnaire. The importance ques- 
tionnaire had the student examine each test item 
and classify it either as “does” or “does not show 
how well you grasped the material in the reading 
passages.” This let the students judge and classify 
the tested items of information as to their impor- 


tance. 


Procedure 


The procedure was that used in the first experi- 
ment with the following changes. The directions 
told the students they were participating in an ex- 
periment "designed to investigate how people 
learn from written prose materials," since stu- 
dents in the first experiment were convinced t 
were in an experiment. 

A third manila envelope was given each student 
in the experimental groups after they completed 
the posttest. This envelope contained the impor- 
tance questionnaire and a second copy of the test. 

In this experiment, the time students copied 
from the blackboard was changed every 10 seconds 

her than every minute. 
ro hen the HATER returned to class 11 days 
later following & period of observation, in the 
public schools, the students in the experimental 
oups without warning or without feedback from 
the first test were again given the posttest and were 
asked to record the time when they began and 


finished the exam. 
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TABLE 2 


Means AND STANDARD DEVIATIONS FOR THE 
POSTTEST SCORES IN EXPERIMENT 2 


Posttest ven 
Group/item type 
M SD M SD 
Behavioral 
Unimportant rec- 
ognition 9.97 | 3.81 | 6.30 | 2.22 
Important recog- 
nition 12.60 | 1.96 | 11.66 | 1.96 
Important appli- 
cation 11.83 | 2.32 | 12.40 | 2.03 
Nonbehavioral 
Unimportant rec- 
ognition 6.17 | 3.33 | 4.30 | 2.28 
Important recog- 
nition 12.50 | 1.81 | 11.57 | 2.31 
Important appli- 
cation 11.87 | 2.91 | 11.83 | 2.76 
Results 


Importance Questionnaire Data 


As anticipated, a definite pattern emerged 
from the importance questionnaire data. 
Application items were judged important by 
most students. The percentage of students 
judging single application items as important 
ranged from 68 % to 98% (Mdn = 89.25 95). 
Recognition items requiring definitions were 
also judged important, with individual item 
agreement ranging from 80% to 98% (Mdn 
= 94.38%). On the other hand, as antici- 
pated, recognition questions requiring dates 
and names were classified as unimportant. 
The percentage of students judging single- 
name items as unimportant ranged from 72 % 
to 82% (Mdn = 77.17%), while the date 
questions ranged from 89 % to 91 % (Mdn = 
89.13%). 

Based upon the importance questionnaire 
data, the posttest questions were subdivided 
into three types of items: one third were un- 
important recognition items (including both 
name and date questions), one third were 

important recognition items (the definition 
items), and the remaining third were im- 
portant application items. 


Test Data 


The means and standard deviations for the 
experimental groups subdivided by the three 
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item types are given in Table 2. A two-tailed 

comparison of the control group (M = 19.59, 

SD = 3.18) and the two experimental group 

combined (M = 32.47, SD = 6.65) indi- 
cated that the experimental groups per- 
formed significantly better (t = 12.26, df = 

85, p < .01), supporting the notion that 
learning had occurred. The means of the 
control group on the individual item types 
(unimportant recognition, 3.40; important 
recognition, 8.07; and important application, 
8.03) show improvement occurred on all item: 
types. 1 E 
Nine two-tailed orthogonal comparisons 
were made using the same procedure out- 
lined in the first experiment. As predicted on 
the posttest, the behavioral objective group 
performed significantly better than the non- 
behavioral objective group on the unimpor- | 
tant recognition questions (t = 6.91, df = 

319, p « .01), while the differences between: 
these two treatment groups on the important 
recognition questions (¢ = .30, df = 319, 
p > .05) and the important application: 
questions (/ = .06, df = 319, p > .05) were 
nonsignificant. | 

On the delayed posttest, the same pattern. 
of results held in that the differences be- 
tween the behavioral and nonbehavioral 
groups on the important recognition ques- 
tions (t = .18, df = 319, p > .05) and the 
important application questions (t = 100 
df = 319, p > .05) were nonsignificant, 
while the behavioral objective group per 
formed significantly better than the nonbe- 
havioral objective group on the unimportant 
recognition questions (t = 3.64, df = 319, 
p < 01). 

Posttest-delayed-posttest comparisons Te : 
vealed significant losses for the behavioral” 
and nonbehavioral groups combined on the 
unimportant recognition questions (t = Ti 
df = 319, p < .01) and the important recog- 
nition questions (! = 2.40, df = 319, p € 
-05), but no significant loss on the important 
application questions (t = .69, df = 319, 
p > .05) where a slight gain was shown due 
to the behavioral objective group. 


Objective Use Questionnaire Data 


Only 1 subject of the 30 in the behavior 
objective group reported that he had ei 
used the objectives. Two subjects failed 
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fill in the questionnaire. The data analyzed 
with these subjects removed resulted in the 
same pattern of results as are reported for 
the complete sample. 


Time Data 


When total study time on the passages 
was compared, the behavioral objective 
group (M = 25.60, SD = 8.31) spent sig- 
nificantly more minutes on the task (¢ = 
288, df = 58, p « .01) than the nonbe- 
havioral objective group (M — 19.66, SD — 
7.60). However, when the length of time 
spent initially reading the directions was 
subtracted from the total study time, the 
difference between the behavioral (M = 
2274, SD = 8.25) and the nonbehavioral 
(M = 18.79, SD = 7.55) groups was non- 
significant ({ = 1.94, df = 58, p > .05). 
The differences between the treatments in 
the time spent taking the posttest (¢ = .03, 
df = 58, p > .05) and the delayed posttest 
(t = .29, df = 58, p > .05) were both non- 
significant. 


Discussion 


Much of school learning involves learning 
from meaningful prose. Therefore, it is im- 
Portant to find ways to increase the prob- 
ability that students will process the parts 
of the material relevant to the task set for 
them. Behavioral objectives are one way to 
guide the learner's attention; however, 
previous studies did not consistently show 
learning advantages for students provided 
objectives. The data of this study support 
the hypothesis that the judged importance 
of an item of information determines whether 
knowledge of behavioral objectives during 
training is helpful. Students receiving be- 
havioral objectives during study performed 
significantly better on test questions that the 
Majority of the students classified as unim- 
portant than students given a nonbehavioral 
objective. This difference appeared in both 
the posttest and the delayed posttest data 
indicating that the difference remained 11 
po later. Students given behavioral objec- 
in on test questions that the majority o 

e students classified as important did not 
Perform significantly differently from stu- 

‘ents given the nonbehavioral objective on 
either the posttest or the delayed posttest. 
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This was true for both application- and 
recall-level questions. 

_ Other ways of guiding the learner's atten- 
tion to specific parts of prose passages may 
very well operate in a similar manner. Such 
techniques should follow the hypothesis 
used in this study for behavioral objectives 
only if they consistently set off information 
needed when the learner is tested and if the 
learner is aware of this relationship. The 
underlining in the passages used for the first 
experiment was not accompanied with direc- 
tions stating that the underlined words were 
items that would be tested. Not all items that 
were underlined were tested and some items 
that were not underlined were tested; 
therefore such underlining does not meet the 
conditions described above. At least one 
study has shown benefits from the experi- 
menter underlining when directions tell 
students they will be tested over the under- 
lined portions (Crouse & Idstein, 1972). 

An important question that remains to be 
investigated is whether the hypothesis sup- 
ported by these studies will generalize to 
other materials. The length of the study 
passage, the number of ideas presented, and 
the length of study time are variables that 
might affect the generalizability of the hy- 
pothesis. 

As in the first experiment, the time data 
revealed that the behavioral obj ective group 
spent more time on the total task than the 
nonbehavioral objective group. Since the 
directions to the behavioral objective group 
were much longer, the time the students 
spent initially reading the instructions was 
subtracted from the total study time. When 
the adjusted scores were compared, the dif- 
ference between the two groups was nonsig- 
nificant in both studies. Although no differ- 
ence was found in the length of study time, 
the groups may have used their time differ- 
ently. Logically, behavioral objective stu- 
dents might spend some of their time going 
back and rereading parts of the objectives, 
while nonbehavioral objective students 
would have less reason to do this. Some 
students were observed following this logic; 
however, there was no way to measure these 
behaviors objectively- If these patterns of 
behavior were fairly general, it is possible the 
behavioral objective group spent less of their 
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study time on the reading passages than the 
nonbehavioral objective group. 

As expected, students performed less well 
11 days later on the immediate posttest. 
This finding held for all question types with 
one exception, the important application 
questions, in which no significant difference 
was found. This was not predicted and needs 
further investigation to see if similar results 
can be obtained again. 
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determined by 


were answered correctly. 


The primary purpose of this study was to 
determine whether imagery has the same 
enhancing effects in the classroom upon in- 
cidental learning as that found in inten- 
tional learning. Studies have repeatedly 
demonstrated that intentional learning, in 
which specific instructions to learn are 
given, is a far superior form of learning 
than incidental learning (Brown, 1954; Jen- 
kins, 1933; Postman, Adams, & Bohm, 
1956). The image-evoking ability of words 
has been shown to facilitate the retention of 
intentionally learned verbal material. The 
ease of visual representation of words 
ranges from abstract nouns, to concrete 
nouns, to pictures and objects, in increasing 
order of concreteness (Paivio, 1969). Paivio 
proposed a two-process theory to explain 
why imagery actually works. He suggested 
that stimuli such as conerete words produce 
both verbal and perceptual codes since their 
meoning is derived through association with 
d Fus is based on a doctoral dissertation 
aS addi. College, Columbia Univer- 
dded MEE A the Kris Cae a 
Bist d ice hite, chairman, N. e 
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ica Ae id ? rane wit de is uem 
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Two groups of 108 fifth-grade children were presented with illustrated 
or nonillustrated incidental information that served as content 
material for a spelling and a grammar exercise. Incidental learning was 
letermin two tests measuring retention of the content material. 
The incidental-learning test performance of good and poor readers 
and scores obtained on recognition versus recall questions were also 
compared. Results indicated that (a) illustration facilitated incidental- 
learning retention, (b) good readers retain more incidental learning 
than poor readers, and (c) more recognition than recall questions 
í The facilitating effect of illustration was 
explained according to an imagery hypothesis. 
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both concrete objects and other words. Ab- 
stract terms such as “justice,” however, de- 
rive their meaning primarily through verbal 
experiences and are, therefore, likely to 
arouse only verbal storage codes. The two- 
process theory thus identifies imagery ef- 
fects with amount of information stored as 
a result of its additional visual code. 

Imagery, as used in this study, refers to 
the illustration of learning materials that 
are incidental to a central learning task. 
These illustrations may thereby directly 
create an additional visual representation 
of the incidental stimuli, which should fa- 
cilitate incidental-learning retention, ac- 
cording to Paivio’s (1969) theory. 

The importance of this study lies in its 
practical application to curriculum design. 
The implication is that one subject area can 
be included as ancillary content material 
for an entirely different lesson. The end re- 
sult is a coordinated curriculum in which 
the learning of two or more areas is 
achieved at little additional investment of 
cost or time. 

The relationship between reading ability 
and incidental learning and a comparison 0 
recall versus recognition scores obtained on 
incidental-learning tests were also included 
in the present investigation. The purpose 
was to broaden the scope of the study while 
ascertaining whether the effects of illustra- 
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tion are comparable among children of dif- 
ferent reading levels and on two types of 
response scores. 

Finally, a substudy compared the reten- 
tion of material learned intentionally or in- 
cidentally. Previous research has tended to 
use word lists or paired-associates as exper- 
imental materials. The present study, how- 
ever, sought to determine whether the oft- 
found superiority of intentional over inci- 
dental learning would also be replicated 
when task-specific procedures and curricu- 
lar learning materials are employed. Such 
parallel results would provide justification 
for the use of curricular materials in inci- 
dental-learning studies. 


METHOD 


The research was conducted in two different 
classroom curricular areas, The incidental-learning 
material was presented as part of a spelling lesson 
in Part 1 and as content material of a grammar 
exercise in Part 2. The rationale for including a 
second task was to broaden the scope of informa- 
tion employed as incidental material and to com- 
pare the results with Part 1, in which different 
procedures and learning materials were used. Part 
2 was completed approximately four to five weeks 
after Part 1. The subjects were the same for both 
parts of the study. 


Subjects 


The sample consisted of 216 fifth-grade children 
(113 boys, 103 girls). They were the students of 
10 classes selected from three publie schools, 
located in the same area. The subjects within each 
class were paired according to reading scores ob- 
tained on the Metropolitan Achievement Test 
(Form F, Elementary Grades) which was admin- 
istered to them in the fourth grade. Each member 
of the matched pair was then randomly assigned 
to the imagery or nonimagery group in an A-B- 
B-A pattern. The imagery and nonimagery groups 
differed by the presence or absence of illustrations 
for their incidental-learning materials. The mean 
reading level for each group of 108 subjects was 
5.25. The effect of presentation method (imagery 
versus nonimagery) upon incidental-learning re- 
tention represented the major variable investigated 
in this study. 

The imagery and nonimagery groups were then 
further subdivided into poor readers and good 
readers. The reading groups were obtained by cal- 
culating the mean reading score for each of the 108 
imagery and nonimagery matched pairs. Pairs 
whose mean reading level was below the median of 
5.2 constituted the poor readers and those above 
the median, the good readers. The imagery and 
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nonimagery groups were thus evenly divided be- 
tween 54 poor and 54 good readers, The mean 
Metropolitan Achievement Test scores were 39 
and 6.6, respectively, for both groups. Comparison 
of poor and good readers on incidental-learning 
retention was the second variable examined by 
this study. 


Materials and Procedure 


Part 1. The spelling lesson consisted of 94 
sentences, each containing a blank space in which 
a spelling word was to be written. Each sentence 
consisted of varying levels of information relating 
to social studies or science facts that served as the 
incidental-learning material. Two sets of identical 
spelling texts were prepared with the imagery 
group having, in addition, their incidental-learning 
material illustrated by a picture below each 
sentence that presented in visual form the informa- 
tion given in the text. For example, the spelling 
item, “Income (4az) must be paid before April 15th 
in the United States,” was illustrated by a picture 
of an April calendar with the “15” prominently 
circled. 

Uniform instructions were given to each teacher 
to dictate each spelling word, the accompanying 
sentence demonstrating its usage, and finally, the 
word again. The children were then instructed to 
write each spelling word in its corresponding blank 
space. Immediately following completion of the 
spelling lesson, a test was administered to all 
subjects which measured their retention of the 
incidental-learning material. The test was com- 
posed of 12 recall (fill-in) and 12 recognition (multi- 
ple-choice) questions, each one corresponding to 
information given in 1 of the 24 sentences. The 
items were counterbalanced for sequence with 
half of the subjects tested first on the recall ques- 
tions and half on the recognition questions. One 
half hour was allotted to complete the incidental- 
learning test. Comparison of recall and recognition 
incidental-learning scores constituted the third 
variable explored by the present investigation. 

It should be noted that all incidental-learning 
material used in this study was first piloted on 8 
Separate group of 25 intellectually gifted fifth- 
graders to rule out any possibility that the infor- 
mation was already known to the subjects. 

Part 2. The incidental-learning material for the 
grammar exercise consisted of a social studies 
lesson on China and a science lesson on whales, 
each approximately one page in length. As in Part 
1, the imagery and nonimagery texts differed bY 
the presence or absence of illustrations. A ith 

The teachers were instructed to review Wit 
their classes proper and common nouns ani how 
to recognize them in sentences. The social studies 
and science lessons were then presented to i 
class as practice exercises in noun recognition a - 
instructions to circle the proper nouns and uni a 
line the common nouns. Forty-five minutes m 
given to complete the grammar exercises. Inns 
diately following completion of the exercises 
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test measuring retention of the incidental-learning 
material was administered to all subjects. As in 
Part 1, the test consisted of 24 questions divided 
into 12 recall and 12 recognition questions. One 
half of the questions pertained to the China lesson 
and the other half to the lesson on whales. The test 
had a 30-minute time limit. 

Certain precautions were taken to minimize the 
possibility of any subject anticipating the inci- 
dental-learning test of Part 2 as a result of his 
prior experience with Part 1. The measures taken 
included 

1. Four to five weeks had elapsed between Parts 
land 2. 

2. No materials were distributed to the teachers 
in their students’ presence as was done in Part 1. 
This procedure precluded any chance of the in- 
vestigator being seen as an associative link between 
both parts of the study. 

3. A postexperimental inquiry was conducted 
by each teacher to determine whether any subject 
anticipated the incidental-learning test or delib- 
erately rehearsed the material (Postman, 1964). 
Four subjects were eliminated from the study as 
a result of the inquiry. 

i 


Statistical Analysis 


The research design was in the form of a 2 X 
2 X 2 analysis of variance with repeated measures 
on the first two factors (Winer, 1962). The three 
factors were Presentation Method (Imagery versus 
Nonimagery) X Type of Question (Recall versus 
Recognition) X Reading Level (Poor versus Good 
Readers). Parallel but separate analyses were per- 
formed for both parts of the study. 


SuBSTUDY 


Subjects and Procedure 


An additional fifth-grade class comprised of 18 
subjects from a fourth public school constituted 
the intentional-learning group. The entire class 
was given the nonillustrated grammar exercises 
of Part 2 with the additional instructjons to learn 
the information while circling or underlining the 
nouns, Immediately thereafter, the incidental- 
learning test was administered. All time allotments 
‘Were the same as in Part 2. The incidental-learning 
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test scores of the intentional-learning group were 
compared to the incidental-learning group (non- 
imagery group, Part 2) in order to determine the 
amount of retention occurring in the presence or 
absence of instructions to learn. 


Statistical Analysis 


An analysis of covariance, with reading achieve- 
ment as the covariate, was performed since the 
intentional- and incidental-learning groups were 
not prematched according to reading ability as in 
the main study. 


RESULTS 


The results for both parts of the main 
study are presented and analyzed together 
since the findings were so consistently simi- 
lar. 

Table 1 presents the means and standard 
deviations for all three variables. The 
higher incidental-learning scores obtained 
by the imagery group in both curricular 
areas clearly demonstrates that illustrated 
incidental material was better retained than 
nonillustrated material. The means for the 
good readers were also greater than those 
for the poor readers in both subject areas. 
Finally, the subjects scored higher on inci- 
dental-learning recognition questions than 
on recall questions. 

An analysis of variance indicated that all 
main effects were highly significant while 
none of the interactions were significant. 
The F ratios for presentation method were 
9.76 (df = 1/106, p < .005) for Part 1 and 
16.25 (df = 1/106, p < .0005) for Part 2. 
The comparable F values for reading level 
were 107.75 (df = 1/106, p < .0005) and 
118.94 (df = 1/106, p < .0005). Finally, 
the F ratios for type of question were 50.19 
for Part 1 (df = 1/106, p < .0005) and 
43.08 for Part 2 (df = 1/106, p < .0005). 


Man NUMBER or Correct RESPONSES AND STANDARD DEVIATIONS FOR INCIDENTIAL LzARNING TESTS 
or Main STUDY 
t Presentation method Reading level Type of question 
Curricular areas 4 
Imagery| SD gas SD | Good | SD | Poor | SD Recog-| SD | Recall | SD 
Part i 4.69 | 8.30 | 4.00 | 6.52 3.49 | 5.27 | 2.95 
1, spelling | 12.57 | 5.35 | 11.05 | 5.73 | 15.30 E05 E eas | 3.05 


art 2, grammar 


13.36 | 5.39 | 11.27 | 5.50 | 15.79 4.17 | 8.82 


236 


Thus, each of the three variables asserts its 
effect independently across both levels of 
the other two factors. 

An analysis of covariance for the sub- 
study yielded adjusted incidental-learning 
means of 15.27 for the intentional-learning 
group and 11.48 for the incidental-learning 
group. The difference was significant (F — 
12.29, df — 1/123, p « .0005), indicating 
that material which is learned intentionally 
is better retained than material which is 
learned incidentally. 


Discussion 


The findings indicate that illustration ean 
facilitate the retention of incidental mate- 
rial in the classroom regardless of reading 
ability or method of questioning. Further- 
more, the striking similarity of results ob- 
tained in Parts 1 and 2, despite the use of 
entirely different procedures and subject 
matter, serves to lend further credence to 
the findings and to their generalization be- 
yond the confines of the present study. 

The superiority of the good over the poor 
readers in incidental-learning retention ap- 
parently indicates that, as in intentional 
learning, the brighter students and better 
readers are usually the better incidental 
learners. The finding that more recognition 
than recall questions were answered cor- 
rectly is also in agreement with the results 
of previous studies, 

The investigator attributed the effects of 
illustration to imagery. Alternative theoret- 
ical explanations should, however, also be 
considered. For example, the superiority of 
illustrations can also be attributed to their 
ability to attract and maintain the subject’s 
attention to the incidental material. The 
pictures, in effect, could serve to focus one’s 
attention on the incidental-learning facts, 
resulting in the subject spending more ef- 
fective time on the relevant information in 
the text (Simon & Jackson, 1968). A second 
explanation is that illustrations may serve 
as attention-capturing devices by arousing 
curiosity (Paradowski, 1967). 

The following postexperimental analysis 
was performed in order to clarify the theo- 

retical implications. It was reasoned that 
the relative amount of illustration is impor- 
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tant according to an imagery hypothesis, 
whereas it is not crucial for either of the 
two “attentional” hypotheses. For example, 
one of the incidental items of Part 1 con- 
sisted of the statement, “A (frog) has four 
toes on his front legs and five toes on hi 
hind legs.” The illustration accompanying 
this sentence showed a detailed drawing of 
a frog with four front toes and five hind. 
toes. For an imagery interpretation, the vis- 
ual duplication of information provided im 
the text is a must. The reason for this is 
that the two-process theory attributes the 
effects of imagery to the amount of infor- 
mation stored as a result of its additional 
visual code. Consequently, only a detailed 
illustration paralleling the information pro- | 
vided by the text is helpful in answering the 
subsequent incidental-learning test ques- 
tion, “How many toes does a frog have on | 
his front legs?” According to an attentional 
interpretation, however, a picture of only a 
frog's head might have served the purpose 
equally well since the illustration serves 
only an attentional, rather than an informa- 
tion-providing function. à 
The analysis was done in the following 
manner. Two psychology graduate students” 
were asked to rate each of the illustrated | 
spelling and grammer facts as to whether 
they were high or low on imagery. Guide 
lines for receiving a high rating included 
(a) clarity of the drawing and (b) how 
close it came to visually replicating the in- 
cidental-learning information provided in 
the text. A third graduate student was used 
for the seven spelling and eight grammet | 
items on which the first two judges disa- 
greed. 1 
It was then hypothesized that, according 
to an imagery interpretation, a greater per- 
centage of high-imagery items should be 
passed by the imagery group than those 
items rated low on imagery. No such differ- 
ence was predicted according to the atten- 
tional hypotheses. 1 
Results showed that 57.5% of the high- 
imagery items were passed compar j 
57% for the low-imagery items. Compara? | 
tive results for grammar were 63.276 ke l 
54.1%, respectively (x? = 7.4, df = 1,P | 
01). The grammar findings thus suggest 9? $ 


IMAGERY AND INCIDENTAL LEARNING 


imagery interpretation, whereas the spelling 
results are more in agreement with an at- 
tentional explanation. 

Of particular pertinence to the theoretical 
argument is a descriptive occurrence during 
the study that cannot be quantified. The 
investigator, while grading the incidental- 
learning tests, came across 15 papers in 
which the subjects actually drew their own 
ilustrations near the correct answer, pre- 
sumably as a mnemonic device. Even more 
dramatic were the three subjects who drew 
pictures in the answer spaces with explana- 
tory notes to the effect that they could not 
recall the answers but knew what they 
looked like. These test behavior phenomena 
point strongly to an imagery interpretation 
and support Bahrick’s (1969) findings that 
verbal and visual components of a memory 
trace can be forgotten independently. 

The practical educational implications of 
this study are numerous and varied and 
have direct application to the new instruc- 
tional technology recommended for the 
classroom (Carnegie Commission on Higher 
Education, 1972). For example, instead of 
including relatively unimportant informa- 
tion as ancillary material for spelling les- 
sons or grammar exercises, the present 
study suggests that it would be far more 
worthwhile to employ educationally in- 
formative material. Furthermore, it is pref- 
erable to include material that is being 
presented for the first time or that needs 
review rather than material with which the 
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students are already familiar. Illustration 
would then be a means of assuring greater 
incidental-learning retention as demon- 
strated by the results of this investigation. 
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Sixty male subjects were tested, both in the a.m. and p.m, in a 
Latin square design, on six cognitive tests and on the electro- 
encephalogram (EEG). Performances of overlearned repetitive tasks 
were found to be better in the morning than in the afternoon, while 
the reverse pattern of performance was found for perceptual- 
restructuring tasks. The EEG was found to be more sensitive to 
photic stimulation in the afternoon than in the morning. The findings 
were interpreted as reflecting a hormonally mediated shift from 
adrenergic to cholinergic dominance in the brain, from morning to 
afternoon. Pedagogical implications are discussed. 


This study tests the hypothesis that per- 
formances of overlearned, serially repetitive 
automatized tasks and perceptual-restruc- 
turing tasks change in opposite ways from 
morning to afternoon; that is, performances 
of overlearned repetitive tasks are expected 
to be better in the morning than in the 
afternoon and performances of perceptual- 
restrueturing tasks are expected to vary in 
the reverse manner. Perceptual-restructur- 
ing tasks are defined as novel tasks in 
which perceptually obvious stimulus attri- 
butes do not facilitate task solution but, 
rather, act to conceal or obseure the percep- 
tion of nonobvious but correct stimulus at- 
tributes, for example, the Embedded Fig- 
ures Test (Witkin, 1950). 

The psychological and physiological 
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processes underlying performances of seri- 
ally repetitive automatized tasks, for exam- 
ple, speed of color naming or simple addi- 
tions, and perceptual-restructuring tasks 
have been the object of theoretical concern 
(Broverman, 1964; Klaiber, Broverman, & 
Kobayashi, 1967). At a psychological level 
of discourse, it has been hypothesized that 
variations in performances of overlearned | 
repetitive tasks are a positive function of 
the strength of response to appropriate, per- 
ceptually obvious stimulus cues. 

On the other hand, performances of per- 
ceptual-restructuring tasks are thought to 
be, in part, a negative function of the | 
strength of response to obvious stimulus at- 
tributes that, in this case, are not appropri- 
ate since they tend to obscure the correct, 
nonobvious stimulus attributes. Thus, nega- 
tive intrapsychic relationships are postu- 
lated to exist between the psychological 
processes underlying performances of over- 
learned, simple repetitive tasks versus per- 
formances of  perceptual-restructurin 
tasks. First-order correlations between SUC” | 
tasks, however, are typically pastum 
though small (for example, Podell 82 
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Phillips, 1959). The reason for the positive, 
rather than the expected negative, correla- 
tions has been postulated to be due to the 
overriding effect of a general ability factor 
present in both categories of tasks (Brover- 
man & Klaiber, 1969). Demonstration of the 
postulated negative relationship may, 
therefore, require assessments of both tasks 
in the same individuals under varying con- 
ditions, when the conditions are known to 
affect at least one of the two categories of 
tasks. In this manner, the general level of 
performance of an individual is held con- 
stant, while conditions known to affect one 
category of task, for example, overlearned 
repetitive tasks, are varied. The opposite 
pattern of change in performance may then 
be predicted for the remaining category of 
task, in this case, perceptual-restructuring 
tasks; that is, if a given condition is known 
to adversely affect performances of autom- 
atized tasks, the same condition should 
enhance performances of perceptual-re- 
structuring tasks. 

Performances of such serially repetitive 
tasks as speed of color naming and simple 
addition problems have been reported to be 
better in the morning than in the afternoon 
(Hollingworth, 1914; Muscio, 1920). The 
decline in performance from morning to 
afternoon was attributed to cumulative fa- 
tigue arising from normal daily activities 
(Hollingworth, 1914). 

Therefore, since performances of simple 
repetitive tasks are known to be better in 
the morning than in the afternoon, the pres- 
ent study predicts that performances of 
perceptual-restructuring tasks will vary in 
the opposite manner; that is, perceptual- 
restructuring tasks should be performed 
better in the afternoon than in the morning. 

Physiologically, it has been suggested 
that performances of simple repetitive tasks 
are dependent upon central adrenergic 
neural processes, while performances of per- 
ceptual-restructuring tasks are dependent 
Upon central cholinergic neural processes 
(Broverman, Klaiber, Kobayashi, & Vogel; 
1968). This hypothesis stems from reports 
that adrenergic stimulants tend to enhance 
Performances of overlearned repetitive 
tasks and to impair performances of per- 
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ceptual-restructuring tasks, while cholin- 
ergic stimulants have the reverse pattern 
of effects (Broverman et al., 1968). Accord- 
ingly, indices of central adrenergic func- 
tioning should be higher in the a.m. than in 
the p.m. The present study tests this hy- 
pothesis with an  electroencephalogram 
(EEG) index, EEG “driving” in response 
to photie stimulation. EEG driving refers 
to the tendency of the EEG waves to 
synchronize with a rapidly flashing bright 
light placed before the closed eyes of a sub- 
ject. Adrenergic stimulants tend to diminish 
this effect, while adrenergic blocking agents 
tend to enhance the phenomenon (Vogel, 
Broverman, Klaiber, & Kun, 1969). There- 
fore, the present study predicts that signifi- 
cantly less EEG driving will occur in 
the morning than in the afternoon. 


METHOD 


Subjects 
Sixty paid, normal male volunteer subjects 


participated in the study. Because the subjects 
ranged widely in age (from 12 to 50 years), they 
were trichotomized into three equal-sized age 
groups to check for possible age differences in 
morning-to-afternoon changes in performance. The 
age groupings were as follows, with 20 persons in 
each group: (a) from 12 to 18 years (M = 140, 
SD = 2.12), (b) from 19 to 23 years (M = 204, 
SD = 153), and (c) from 24 to 50 years (M = 387, 
SD = 2.74). 

Subjects with known physical problems were ex- 
cluded, as were all subjects taking any sort of 
medication or who admitted to taking drugs. 


Cognitive Tasks 


Overlearned Repetitive Tasks 


Speed of color naming (Broverman, 1964). This 
task requires the subject to name, as fast as 
possible, 10 lines of randomly arranged red, green, 
and blue color patches, 10 patches to the line. The 
time taken to complete the task is taken as the 

re for the test. 2 
e of naming repeated objects. Subjects are 
asked to name, as fast as possible, 100 randomly 
arranged pictures of a cup, & tree, and a fly, 10 pic- 
tures to the line. The time taken to complete the 


k is taken as the score. T. 
i ally repetitive tasks such 


Errors on simple, seri 
as these are rare. Scores corrected for errors have 


been found to correlate .9* with uncorrected scores 
(Stroop, 1935). Hence, errors are typically not 


recorded. 
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Perceptual-Restructuring Tasks 


Embedded Figures Test (Witkin, 1950). This 
task requires the subject to find a simple geometric 
figure embedded within a complex pattern of lines 
and colors. To make two alternative sets, 5 simple 
figures (A, C, D, E, and G) and 10 complex figures 
were used. The two sets were as follows: Set 1, 
Figures A1, C2, D2, E1, G2; and Set 2, Figures A3, 
C3, D1, E5, G1. 

The total time taken to identify each of the 
five figures in a given set was taken as the score for 
the task. A maximum of 90 seconds was allowed for 
each figure. 

Wechsler Adult Intelligence Scale (WAIS) 
Block Design subtest (Wechsler, 1955). This task, 
which requires subjects to reduce the obvious 
perceptual pattern to parts corresponding to blocks 
in order to construct the given design, has been 
reported to load on the same cognitive factor as 
the Embedded Figures Test (Broverman, 1964). 
Two sets of block designs were formed: Set 1, the 
odd-numbered designs; and Set 2, the even- 
numbered designs. 


Choice-of-Response Tasks 


Two additional tasks that permit subjects to 
respond to either obvious or nonobvious embedded 
duties attributes were also included as described 

elow. 

Rod-and-frame test (Witkin, Lewis, Hertzman, 
Machover, Meissner, & Wapner, 1954). This test 
requires the subject to adjust a luminescent rod to 
the vertical in a darkened room when the rod is 
within a tilted luminescent square frame. "For 
successful performance of this task the subject 
must ‘extract’ the rod from the tilted frame through 
reference to body position [Witkin, et al., 1954, p. 
25]." The subject, then, must inhibit the influence 
of the frame. 

The data from the rod-and frame situation was 
scored in a manner consistent with Witkin et al., 
(1954) in measuring field dependence and field 
independence. This measure consists of the sum 
of the absolute deviations from true vertical. In 
the terms of this study, field dependence would be 
analogous to heightened responsivity to obvious 
visual stimuli, that is, greater response to the 
tilted frame. Field independence would be analo- 
gous to a perceptual restructuring of these visual 

stimuli, that is, evidence of an ability to separate 
judgments of verticality from the influence of the 
tilted frame. 

Rorschach Test. This test was scored for func- 
tional-integrative responses (Phillips, Kaden, & 
Waldman, 1959) which are said to measure the 
subject’s ability to respond to inkblot stimuli in an 
inhibitory, restructuring manner. The functional- 
integrative response is scored when the percept 
indicates that an interaction between two or more 
conceptually independent subunits, that is, move- 
ment of at least one object in relationship to at 

least one other object, is occurring. In addition, the 
percept must contain M (human movement), FM 
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(animal movement), or m (inanimate movement) 
as a determinant. 

Functional-integrative responses were given 
weighted scores according to the criteria set forth 
by Phillips et al. (1959). When the movement is 
active, a weighted score of 4 is given to a func- 
tional-integrative response containing active hu- 
man content and a weighted score of 2 is given 
to a functional-integrative response with animal 
content. When the movement is passive or 
static, a weighted score of 1 is given to fune- 
tional-integrative responses with either human or 
animal content. A weighted score of 1 is given to 
functional-integrative responses with inanimate 
(neither animal nor human) content. 

The Rorschach Test was administered to the 
subjects in two sets of five cards cach, one at each 
of two experimental sessions. The cards were 
separated into sets of aproximately equal stimulus 
value, and average number and quality of re- 
sponses, according to Phillips and Smith (1953), 
as follows: Set 1 = Cards, 1, 3, 4, 6, 9, and Set 2= 
Cards, 2, 5, 7, 8, 10. 


EEG Photic Driving 


An EEG, with photic stimulation, was obtained 
from each subject at both of the experimental 
sessions. The EEG-driving measure was obtained 
on a Grass model 5D polygraph, with Grass model 
5P5E preamplifiers. The time constant was 2 set- 
ond and the sensitivity used was 30 microvolts per 
centimeter pen deflection. Grass EC2 electrode 
cream was applied under Grass E5 gold disc elec- 
trodes. The EEG data obtained results from one 
pair of occipital electrodes with each electrode 
placed 3 centimeters to either side of the midline, 
between the O;-O: and Ps-P, derivations. 

Recordings were obtained at rest for a period of 
5 minutes and during photic stimulation. Stimula- 
tion was performed while the subject kept his eyes 
closed, using a Grass PS2 photo stimulator whic 
was placed parallel to the plane of the face, 
centimeters from the eyes, so that both visual fields 
were equally stimulated. Stimulation began at 
Intensity 2 for 10 seconds, for trials of 5, 10, 15, 20, 
25, and 30 flashes per second, with 20 seconds rest 
between trials. Similar series of trials were give? 
at Intensities 4 and 8. Thus, each subject was €*' 
posed to 18 photic stimulation trials. f 

An EEG-driving response to photic stimulation 
was defined as two consecutive seconds of BE 
waves at the fundamental frequency of the photic 
stimulation or at one of its harmonics. "Due 
was scored only if the entire response, aS defined, 
occurred within the limits of the duration of the 
photic signal and only if activity from one re- 
quency band, and no other, was present during i 
two-second period, that is, superimposed activity 
was not scored. 


Procedure 


Equal numbers of subjects in each age e 
were randomly assigned to Groups A and B. Gro 
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À began the experimental procedure at 9:30 a.m. 
and was tested again at 3:30 p.m., while Group B 
began the experimental procedure at 3:30 p.m. and 
was tested again the following morning at 9:30 a.m. 
Subjects were requested to avoid all stimulants 
(coffee, coke, alcohol, ete.) throughout the day 
and were screened for a good previous night's rest. 


Task Administration 


The two groups were subdivided into sub- 
groups (A1 and A2, B1 and B2) so that the order 
of presentation of alternate sets of various tasks 
(the Embedded Figures, the WAIS Block Designs, 
and the Rorschach Tests) could be counter- 
balanced. 

This procedure enabled variance due to dif- 
ferences in stimuli to be extracted. 

The order of presentation of the tasks to each 
subject in each of the sessions was as follows: 

1. EEG measure, 

2. rod-and-frame test, 

3. naming color hues test, 

4. Embedded Figures Test, 

5. speed of naming repeated objects test, 
6. WAIS Block Design subtest, and 

7. Rorschach Test. 

All tasks were administered in the same sequence 
to each subject in each experimental session in 
order to minimize possible effects due to order of 
administration. 


RESULTS 


Method of Data Analysis 


A mixed analysis of variance and Latin 
square method of analysis was employed. 
Because the design employs repeated meas- 
ures for each subject, the between-individu- 
als source of variance was first extracted. 
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Sequence effects related either to a.m. ver- 
sus p.m., or Set 1 versus Set 2 are included 
in the between-individuals variance. Since 
these sequence effects were not of theoreti- 
cal interest, no further isolation of these 
effects was carried out. Variance due to age, 
which is also a part of between-individuals 
variance, was isolated and evaluated. 

The within-individuals variance was 
analyzed for the following main effects: 
morning versus afternoon; order (Ist ad- 
ministration versus 2nd administration, or 
practice) ; and stimuli sets, where appropri- 
ate. Interactions of order by morning-after- 
noon and order by stimuli sets are con- 
founded with sequences and, accordingly, 
are not interpretable; therefore, they were 
not computed. 

The interactions of age by morning-af- 
ternoon, age by order, and age by stimulus 
sets are not confounded and were, therefore, 
extracted. 

Square uniqueness is confounded with be- 
tween-individuals variance and, therefore, 
was not extracted. 

Table 1 presents the combined results of 
the study in summary form. Five of the six 
experimental hypotheses involving differ- 
ences in cognitive performance in the morn- 
ing versus the afternoon received statisti- 
cally significant support (only the differ- 
ences in WAIS Block Design performances 
failed to reach significance, but the means 
were in the expected direction). The phys- 


TABLE 1 
Summary or RESULTS 


Task and measure Hypothesis 


Overlearned repetitive 
Color naming 
Repeated objects 
erceptual-restrueturing 
Embedded Figures 
WAIS Block Designs 

Choice of response 
Rorschach 

p.m. than in a.m. 


Rod and frame 
p.m. than in a.m. 


EEG 


p.m. than in a.m. 


* Two-tailed tests used throughout. 


Subjects faster in a.m. than in p.m. 
Subjects faster in a.m. than in p.m. 


Subjects faster in p.m. than in a.m. 
Subjects faster in p.m. than in a.m. 


More field independent scores in 


Subjects more field independent in 


More EEG-driving responses in 


P Status 
PAM UM 

<.05 Supported 
«.025 Supported 
<.01 Supported 

ns Not supported 
«.03 Supported 
« .06 Supported 
«.05 Supported 
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iological EEG measure also changed sig- 
nificantly in the expected direction. 

The specific results are described in detail 
below. 

Table 2 presents summaries of the analy- 
ses of variance of the dependent variables, 
that is, six cognitive tasks and the EEG 
photic-driving index, and Table 3 presents 
the mean performance levels of the depend- 
ent variables in the a.m. and in the p.m. 
The significant results obtained from the 
various sources of variance are discussed 
below. 

Age. Age was significantly related to per- 
formances of both simple repetitive tasks, 
the Embedded Figures Test, and the EEG 
index. The youngest group appears to have 
performed markedly less well than the two 
older groups on these three cognitive tasks 
and to have had significantly more photic- 
driving responses than the two older groups. 

Time of day. This variable, which is the 
critical variable in the study, was signifi- 

cantly related to the EEG index and to all 
of the cognitive tasks, except the WAIS 
Block Designs, in the expected directions. 
Thus, as expected from previous studies 
(Hollingworth, 1914; Muscio, 1920) both 
simple repetitive tasks were performed bet- 
ter in the a.m. than in the p.m. The per- 
formances of both perceptual-restructuring 
tasks, however, change from the a.m. and 
the p.m. in the opposite manner; that is, 
they were better in the p.m. than in the 
a.m., significantly in the case of the 
Embedded Figures Test and not with statis- 
tical significance in the case of the WAIS 
Block Designs. 

The choice of response tasks also shows 
the predicted significant differences in 
morning versus afternoon; that is, the rod- 
and-frame test indicated greater field inde- 
pendence in the p.m. than in the a.m. (p < 
.06), while the Rorschach shows a greater 
production of functional-integration re- 
sponses in the p.m. than in the a.m. Func- 
tional-integration responses require that in- 
itial inkblot percepts be reorganized into 
complex, nonobvious percepts. 

Finally, more EEG photic-driving re- 
sponses occurred in the p.m. than in the 
a.m., indicating a probable shift towards 
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cholinergic dominance in the p.m. compared 
to the a.m. 

Order, The second task administration 
was significantly better than the first on 
both simple repetitive tasks and both pers 
ceptual-restructuring tasks. The two 
choice-of-response tasks and the EEG did 
not show significant order effects. 

Stimuli sets. These were significant only? 
for the WAIS Block Designs. I 

Interactions. Only one noninterpretable 
interaction, Age X Order on the, EEG, 
achieved statistical significance. The main 
implication of these negative interactive re- 
sults is that the various age groups do not 
appear to differ significantly in their shif 
in performance from a.m. to p.m. 4 


DiscussioN 


The results of the study provide broad; 
support to the experimental hypotheses; 
that is, all seven changes were in the pren: 
dieted direction and six were statistical 
significant. Thus, the results tend to support 
the postulated intrapsychic negative rela- 
tionship between the processes involved im 
performances of automatized and percepi 
tual-restructuring tasks. Di 

Similarly, the observed changes in cogni- 
tive performances support the no ion that 
central adrenergic dominance existSin 
morning compared to the afternoon. ‘Thus, 
performanees of automatized tasks; which | 
are known to be enhanced by adrenergle | 
stimulants and impaired by  adrenergio 
blocking agents, were better in the morning - 
than in the afternoon. Similarly, perform- 
ances of perceptual-restructuring tasks; 
which are facilitated by cholinergic stimu- 
lants and impaired by cholinergic blocking 
agents, were better in the p.m. than the a.m: 

The a.m.-to-p.m. changes in number 0 
EEG responses to photic stimulation als 
support the notion of greater central adre- 
nergie dominance in the morning than HW 
the afternoon. T 

Hollingworth (1914) suggested that the 
morning-to-afternoon deterioration in per” 
formance of automatized tasks was due to 
the cumulative fatigue of normal activity: 
However, fatigue is a difficult concept E 
define (Bartley, 1957; Floyd & Welford, 
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TABLE 3 
Means or DEPENDENT VARIABLES 
Overlearned repetitive tasks | Perceptual-restructuring tasks| ^ Choice-of-response tasks EEG 
mariable ADDE Embedded | WAIS Block | Rodand | Rorschach |Photic drivin 
Color naming |Object naming i i ^ 
(seconds) | (seconds) | dabis) | (aare) | (dumme) | residen responi 
Age (A) 
Old 52.02 54.57 137.5 21.42 51.35 4.02 9.43 
Middle 49.10 51.77 129.5 21.12 45.20 4.08 8.75 .— 
Young 59.50 61.05 191.7 19.47 35.93 3.38 12.65 - 
Time (B) 
a.m. 52.93 54.98 165.9 20.45 46.44 3.30 9.67. 
p.m. 54.15 56.02 139.9 20.90 41.88 4.35 10.89 
Order 
1st 54.91 58.00 173.6 20.16 46.53 3.90 10.10 
2nd 52.16 53.60 132.2 21.18 41.79 3.75 10.45 - 
AXB 
Old j 
a.m. 51.40 54.40 140.2 21.35 55.60 3.40 8.50 
p.m. 52.65 54.75 134.8 21.50 47.10 4.65 10.37 
Middle 
a.m. 49.30 50.85 153.2 20.70 47.20 3.65 8.75 
p.m, 48.90 52.70 105.7 21.55 43.20 4.50 8.75 
Young 
a.m, 58.10 59.70 204.2 19.30 36.53 2.85 11.75 
p.m. 60.90 62.40 179.2 19.65 35.35 3.90 18.56 
Total M 53.54 55.80 152.9 20.68 44.16 3.83 


1953), and one might expect, at à common- 
sense level, that fatigue would also impair 
perceptual-restructuring tasks. The concept 
of a central shift from adrenergic to cholin- 
ergic dominance, with this shift having 
different implieations for automatization 
versus perceptual-restructuring tasks, seems 
to be an improvement over the fatigue con- 
cept in this respect. 

However, the basis of the hypothesized 
shift from central adrenergic toward cholin- 
ergic dominance then. needs to be ex- 
plained. A hypothesis for this shift in males 
has been offered (Broverman et al., 1968; 
Klaiber, Broverman, Vogel, Abraham, & 
Cone, 1971). Briefly, this hypothesis sug- 
gests that the shift is due to known diurnal 
rhythms in plasma testosterone levels 
(Resko & Eik-Nes, 1966). Plasma testoster- 
one concentrations are highest in the 
morning and slowly subside from that peak 
throughout the day. 

Anthropometrie indices of testosterone 
stimulation (for example, pubic hair devel- 
opment, chest and biceps cireumferences) 


have been reported to be positively related 
to performances of automatized tasks and 
negatively related to performances of pe 
ceptual-restructuring tasks in males (Brov: 
erman, Broverman, Vogel, & Palmer, 1964; 
Klaiber et al., 1967; Petersen, 1973). Infu- 
sions of testosterone (Klaiber et al., 1971; 
Vogel, Broverman, Klaiber, Abraham, & 
Cone, 1971) and injections of testosterone 
(Stenn, Klaiber, Vogel, & Broverman, 1972) 
have positively affected performances OF 
automatized tasks and EEG indices 
thought to reflect central adrenergic func 
tioning. Hence, the known morning-to-8i 
ternoon decline in testosterone could account 
for the hypothesized shift in adrenergic vere 
sus cholinergic dominance and for the ob- 
served changes in performances and in thé 
EEG. : 

The method by which testosterone ma y 


adrenergic functioning in the brain throu d 
their influence upon the enzyme monoamine” 


vt 
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oxidase (Broverman et al., 1968). Monoa- 
mine oxidase activity plays an important 
role in the intraneural regulation of monoa- 
mines believed to be neurotransmitters in 
adrenergic nerves (Kopin, 1964). Excessive 
monoamine oxidase activity could result in 
diminished neurotransmitter levels and im- 
paired central adrenergic functioning. De- 
pressed individuals have been reported to 
have grossly elevated levels of plasma 
(Klaiber, Broverman, Vogel, Kobayashi, & 
Moriarty, 1972) and platelet (Nies, Robin- 
son, Ravaris, & Davis, 1971) monoamine 
oxidase activity. Monoamine oxidase inhibi- 
tors are often used in the treatment of 
depression (Crane, 1970). Testosterone ap- 
pears to inhibit monoamine oxidase activ- 
ity; that is, hypogonadal boys had elevated 
monoamine oxidase activity compared to 
normal boys and testosterone therapy in the 
hypogonadal boys resulted in a shift to- 
wards normal levels of monoamine oxidase 
activity (Klaiber, Broverman, Vogel, & Ko- 
bayashi, 1974). Further, plasma monoa- 
mine oxidase activity has been found nega- 
tively correlated with performances of au- 
tomatized tasks and positively correlated 
with performances of perceptual-restructur- 
ing tasks (Klaiber et al., 1967). 

Hence, the differential shifts in perform- 
ances of automatized and perceptual-re- 
structuring tasks from a.m. to p.m. could be 
due to the decline in testosterone levels 
known to occur during this time. The drop 
in testosterone, in turn, would permit in- 
creased brain monoamine oxidase activity, 
a decline of available neurotransmitters in 
adrenergic nerves, impaired adrenergic 
functioning, and a shift towards cholinergic 
dominance. 

The lack of significant findings involving 
age is interesting. Apparently, the differ- 
ences in maturity, which are known to af- 
fect level of testosterone production, do not 
affect diurnal shifts in the measured per- 
formances. 

It is interesting to speculate on the peda- 
gogical implications of these results, at 
least for males. For instance, teaching pro- 
cedures that seem to involve automatized 
behaviors, for example, drills oriented to- 
ward skill acquisition via sustained repeti- 
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tive practice, might best be conducted in 
the morning. Similarly, such content areas 
as speed and accuracy of reading, spelling, 
and simple additions and subtractions, 
which have been found to be related to au- 
tomatization ability (Broverman, 1964; 
Mathewson, 1967), might be most effec- 
tively taught in the morning. 

On the other hand, subjects that seem to 
require cognitive-restructuring behavior 
might best be taught in the afternoon. Pro- 
ficiency in abstract thinking, mathematics, 
and in technical science courses, which have 
been related to perceptual-restructuring 
ability (Smith, 1964), might be such sub- 
jects. However, more work is needed to de- 
termine precisely which classroom subjects 
and behaviors correlate with, or involve, 
automatization and perceptual-restructur- 
ing abilities. 

Unfortunately, the present study cannot 
be generalized to females. Again, more work 


is needed. 
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COGNITIVE ABILITIES ON STATE ANXIETY AND 
PERFORMANCE IN A COMPUTER-ASSISTED 
INSTRUCTION TASK’ 


JOE B. HANSEN? 


University of Texas at Austin 


Ninety-eight undergraduate educati 
ability tests measuring general rea: 


ion majors received a battery of 
soning, associative memory, and 


trait anxiety (A-Trait) and were randomly assigned to three groups— 
no feedback, feedback, and learner-controlled feedback—for a com- 


puter-assisted instruction course on 


Xenograde systems. State anxiety 


(A-State) measures were taken (a) prior to the course, (b) following 


the administration of stress instructions, 


(c) at the mid-point, and 


(d) at the end. Learner-controlled feedback subjects decreased more 


in A-State than did feedback subjects. 
more errors under feedback than under no 


High A-State subjects made 
feedback. Feedback 


facilitated performance for high-reasoning subjects but impaired per- 


formance for low-reasoning subjects. 


Research on ability-by-treatment inter- 
actions has attempted, through the manipu- 
lation of task variables, to produce aptitude- 
by-treatment interactions by altering the 
relationship between the task and one or 
more specific abilities known to be impor- 
tant to task performance. Such studies have 
sought to establish principles that would 
lead to the development of instructional de- 
sign models more sensitive to individual 
learner differences. A separate, equally im- 
Portant domain of research has dealt with 
motivational factors in learning as am ap- 
proach to the general problem of individ- 
ualization of instruction. Cattell (1966) 
has suggested that anxiety is & function of 
unresolved doubt about an expected out- 
come. If so, then providing feedback could 
reduce subjects’ doubt about performance 
on the learning task and, consequently, re- 
duce anxiety. Drive theory (Spence & 
— 
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Spence, 1966) predicts that high-anxiety 
subjects will perform better than low-anxi- 
ety subjects on simple tasks in which a 
single habit tendency is involved, while low- 
anxiety subjects should be superior on com- 
plex tasks in which competing habit tenden- 
cies are involved. 

Spielberger (1966) distinguishes Trait- 
Anxiety (A-Trait), a relatively permanent 
personality variable, from State-Anxiety 
(A-State), a transitory condition result- 
ing from the amount of threat perceived by 
an individual in a particular situation. The 
State-Trait Anxiety Inventory was de- 
veloped by Spielberger, Gorsuch, and Lush- 
ene (1970) as a means of measuring these 
two types of anxiety separately. 

The major objectives of this study were 
(a) to determine whether information feed- 
back provided during à difficult task will 
reduce A-State and (b) to determine 
whether learner control of feedback will 
lead to further reductions in A-State. An- 
other objective was to attempt to bridge 
cognitive and affective domains by exam- 
ining the relationships between the task 
variables of feedback and learner control 
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with anxiety and cognitive abilities, specifi- 
ally reasoning ability and associative mem- 
ory. 
The following specific hypotheses were 
investigated in this study: 

1, Subjects who receive informative feed- 
back and subjects who have learner control 
over feedback show greater reductions in A- 
State during a computer-assisted instruction 
task than subjects who receive no feedback. 

2. High A-State subjects produce fewer 
errors under the feedback condition than 
under the no-feedback condition, while low 
A-State subjects produce fewer errors under 
the no-feedback than feedback condition. 

3. Subjects in the no-feedback group 
show a higher demand for both reasoning 
and associative memory abilities than sub- 
jects in either the feedback or learner con- 
trol of feedback groups; that is, the amount 
of change in error score per unit of increase 
in each of the ability scores would be greater 
in absolute terms for the no-feedback sub- 
jects than for subjects in either of the other 
groups. 


Metuop 


Subjects 


The subjects were 98 undergraduate female 
education majors at the University of Texas at 
Austin who were randomly assigned to three 
groups: no feedback (n = 38), feedback (n = 31), 
and learner-controlled feedback (n = 29). 


Apparatus 


The task, a computer-assisted instruction course 
on the artificial science, Xenograde (Merrill, 
1964), was a revision of an earlier version described 
in detail by Merrill (1970) and Bunderson and 
Hansen (1972). It was presented by means of an 
IBM 1500 computer system in the computer- 
assisted instruction laboratory of the University of 
Texas at Austin. The system has eight terminals 
of the cathode ray tube type (IBM 1510). Each 
terminal is accompanied by an image projector 
(IBM 1512) for the computer-controlled presenta- 
tion of 16-millimeter transparencies. The terminals, 
each housed in an individual wooden carrel con- 
structed to provide isolation and work space for 
each student, are all located in the same room of 
the computer-assisted instruction laboratory. 


Procedure 


Approximately two weeks prior to taking the 
computer-assisted instruction course, each subject 
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subjects. The test battery included the A- 
scale of the State-Trait Anxiety Inventory (Sj 


& Price, 1963) ; and the Bi-Column Number Series 
Test (Merrill, 1970). The Ship Destinations Test 
is a measure of general reasoning ability, while the 
Bi-Column Number Series Test is designed to 
measure the subject’s facility with the informas 
tion-processing requirements of the task. The 
Object-Number and the First and Last Names 
Tests are measures of associative memory. 

As each subject reported to the terminal room, 
she was assigned to a terminal and was immedi 
ately given a 20-item paper-and-pencil version of 
the A-State scale of the State-Trait, Anxiety Ine 
ventory (Spielberger, Gorsuch, & Lushene, 1970). 
Following the A-State measure, subjects receive 
appropriate instructions on terminal operation, fole 
lowed by ego-involving stress instructions given 
on-line. The stress instructions implied that the 
task was an indicator of intelligence and that each: 
subject would be compared with other college 
students. Following the stress instructions, a $ 
item version of the A-State scale of the State-Trait 
Anxiety Inventory was presented on-line, followe 
immediately by the first example of the Xem 
grade course. Figure 1 is a chart showing the task 
structure. 

The course contained a series of eight sets of 
three examples and three test items illustrati 
eight consecutive hierarchical rules comprising the 
task. Following each example, three test questions 
designed to test the subject's knowledge of the 
exemplified rule were presented. In the no-feed= 
back group, subjects received no feedback follow- 
ing their test-item responses. In the feedback. 
group, they received the words "true" or “false” as 
feedback following each test item, plus a states 
ment of the rule following the ninth test item toi 
each rule. In the learner-controlled feedback con 
dition, subjects were required to type "y" m 
receive the true-false feedback and "n" if no feed 
back was desired following a test item. The learner- 
controlled feedback group also received the option 
of viewing the rule on the image projector follow: 
ing the third test item for any example. Present 
tion of the rule terminated the presentation of 
examples for that rule and resulted in imme ale 
presentation of the first example of the next rule, 

All subjects received a second five-item A-Stal 
scale on-line, following the last test item for i 
fourth rule, and a third five-item A-State scale fol- 
lowing the last test item for the eighth rule. In t» 
manner, A-State changes for each subject were 
tracked over the task period. 


ResvLts AND DISCUSSION 


A varimax factor analysis conducted on 
the ability scores produced two factots— 


XENOGRADE TASK STRUCTURE 


STATE ANXIETY AND PERFORMANCE 


EXPERIMENTAL CONDITION 


LEARNER CONTROLLED 


FEEDBACK FEEDBACK 


l| 
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NO 
FEEDBACK 


20 Item Paper-Pencil - State Anxiety Inventory - All rar ated 


Instructional Sequence on Use of Computer Terminal - All e ath 


3. [^mm Ego Involving Stress Instructions - All Condítions TP 
t 
4. psum Five Item A - State Inventory Presented On-Line - All Conditions 
l ! 
5. buc Example 1 Display for Rule 1 - All Conditions S atid 
6. Co Question 1 for Rule 1, URN 1 - All Conditions POEM: 
i 


7. UC RIS Subject Enters Response - All Conditions 


8, Do You Want to See "True" ot "Fa! 
Your Results, Y or N? Feedback 


9. Subject Enters 
"y" or "NU 


10. 1f "Y" was Entered 


"True" or "Fals 
Feed Back; If "! 
Next Question 


Above Cycle (6-8) 
Repeated for 
Questions 2 and 3 


1l. Above uk (6-10) 
Repeated for 
Questions 2 and 3 


12. Example 2 Display for Rule 1 - All Conditions 


cycle (6-11) Repeated 


13. Cycle (6-11) Re ated 
4 im for Examples 2 and 3 


for Examples 2 and 3 


14. "Do You Want to 
2 


15. Student Response 
"q" or UN" 


16. If "Y" a i 


the Rule is Pr 
on the Film Projector 
Screen 


Statement of Rule is 
Presented on Film 
Projector Screen 


Above Cycle (6-7) 
Repeated for 
Questions 2 and 3 


Cycle (6-11) Repeated 
for Examples 2 and 3 


17. Mi oet Example 1 of Rule 2 - All Conditions eyes ae 
18. cycle 6-15 is Repeated for Each Rule Through Rule 4, for Each LET. 


Five Item A - State Scale is Presented On-Line - All Conditions (Step "Jen 


me 
(C 


19. 
t 
20. (a cycle (1-19) Repeated,- For Each concia mmm] 
2i. are Paper-Pencil Post-Test - All Conditions — — —— 1 
t 
22. END END END 


Figure 1. Xenograde task structure for three experi. 


mental conditions. 


250 


TABLE 1 
VARIMAX Factor Loapines OF ABILITY 
Test Scores 
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Factor loading 
Test Eno 
Reasoning | Associative 
Bi-Column Number Series .8525 | .0901 
First and Last Names —.0089 | .8428 
Ship Destinations .8659 | .0107 
Object-Number .1076 | .8222 


reasoning and associative memory. These 
factor loadings are shown in Table 1. Fac- 
tor scores expressed in z score units were 
used as covariables for aptitude-by-treat- 
ment-interaction analyses. The reasoning 
factor score produced the strongest correla- 
tion with the posttest (r = .3822), while 
memory correlated very weakly with post- 
test (r = 026). 


A-State 


A group-by-trials analysis of variance on 
pre- and poststress A-State with three 
groups revealed that the stress instructions 
did indeed produce an increment in A-State 
(F = 27.99, df = 1/97, p < 01). A three- 
group trials-by-subjects analysis of vari- 
ance using the three poststress State-Trait 
Anxiety Inventory measures as repeated 
measures yielded a significant groups-by- 
trials interaction (F = 3.39, df = 4/190, 

TABLE 2 
Mean STATE Anxiety (A-SrATE) Scores BY 
GROUP AND BY TRIAL 


Trials 
Group [————— 
1 2 3 

No feedback x 

x 10.89 9.87 | 10.24 

SD 2.82 3.85 4.53 

n 38 38 38 
Feedback 

x 11.03 9.90 8.77 

SD 2.68 3.29 3.37 

n 31 31 31 
Learner controlled 

x 11.79 | 10.21 | 8.45 

SD 2.88 3.70 3.05 

n 29 29 29 


p < .05). The learner-controlled feedback 
group showed the greatest decline in A-Stg 
over the task, with the feedback group nex 
while the no-feedback group remained at a 
relatively high level throughout. Table 2 
reveals the mean A-State scores for each 
group at each of the three points in time 
Figure 2 is a graph of these data. 

A third analysis of variance revealed that 
the three groups did not vary significan 
on A-State at the beginning of the task pe 
riod, following the stress instructions. The | 
following simple effects were also tested: A 
State at Time 1 was compared with A-State | 
at Time 3 within each group. These dat 
are reported in Table 3. Viewed collectively, 
these data provide strong support for the 
hypothesis that feedback and learner-con- } 


A-STATE 


1 2 J 


TIME OF A-STATE MEASURE 


Ficure 2. Mean state anxiety (A-State) E 
by group. (Abbreviations: NF = no fee pe 
FB = feedback, and LC = learner-controlled fe 
back.) 
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TABLE 3 


Comparison OF TIME 1 AND Time 3 MEAN STATE 
Anxiety (A-SrATE) SCORES BY Group 


Group F df ? 
No feedback or 1/37 «.3 
Feedback 18.38 1/30 «.01 
Learner controlled 36.55 1/28 «.01 


trolled feedback result in greater A-State 
reductions than does no feedback. 


Anaiety-by-Treatment Interactions 


Multiple linear regression analysis (Bot- 
tenberg & Ward, 1963) was used to test the 
hypothesis that high A-State subjects would 
perform better than low A-State subjects 
under feedback, while the opposite results 
would occur under no feedback. The regres- 
sion of error rate on mean A-State scores 
produced an interesting though nonsignifi- 
cant anxiety-by-treatment interaction (F = 
2.69, df = 1/65, p < .11). These results are 
shown in Figure 3. The results for the no- 
feedback group conformed very well to 
predictions, while the feedback group pro- 
duced results almost the opposite of those 
predicted. 


Aptitude-by-Treatment Interactions 


In order to test the hypothesis that for 
each unit of change on each of the ability 
Scores, the amount of change in posttest 
Scores would be greater for the no-feedback 
condition than for either. of the other condi- 
tions, a two-covariable analysis of covari- 
ance was conducted by means of a multiple 
linear regression approach, using program 
COVAR2. In this analysis, the two ability 
scores were covaried simultaneously, while 
oe scores served as the dependent vari- 
able, 

This analysis revealed no interaction be- 
tween covariables. It also failed to produce 
the predicted interaction between the treat- 
ment conditions and the two covariables 
combined. A significant interaction did oc- 
— 
oe program was written by E. E. Jennings, 
M dr of Texas at Austin (Ward & 

ings, 1973). 
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eur, however, between the treatments and 
the reasoning factor scores. This interaction 
(F — 3.15, df — 2/91, p « .05) is illustrated 
in Figure 4. Although statistically signifi- 
cant, the results were once again virtually 
opposite to the direction predieted. 

The feedback condition produced the 
strongest relationship between reasoning 
ability and performance, as can be seen by 
the steeper regression slope for this group. 
No feedback showed the weakest relation- 
ship. Reasoning ability seems to be of 
greater value when feedback is present after 
every problem than when feedback is absent 
or under learner control. A single covariable 
analysis using mean errors per problem as a 
dependent variable and reasoning factor 
scores as the covariables produced highly 
similar results (F = 7.27, df = 2/92, p < 
01). 


Interpretations 


It seems clear that feedback and learner- 
controlled feedback resulted in reductions in 
A-State not obtained under no feedback. 
Some of the decrease in A-State probably 
resulted from the subjects’ adaptation to 
the task and to the computer-assisted in- 
struction medium. However, the crossing of 
the feedback and learner-controlled feed- 
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Ficure 3. Regression of error rate on mean 
state anxiety (A-State) for the feedback (FB) and 


no feedback (NF) groups. 
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i iu S REASONING pom 
Figure 4. Double covariance analysis with post- 
test as dependent measure and associative memory 
(Ma) and reasoning factor scores as covariables. 
(Abbreviations: FB = feedback, LC = learner- 
controlled feedback, and NF = no feedback.) 


back group means over the no-feedback 
group as shown in Figure 2 suggests that the 
treatments were effective in reducing A- 
State. 

Spielberger, O’Neil, and Hansen (1970) 
found that high A-State subjects performed 
more poorly under conditions of informative 
feedback than did low A-State subjects but 
that high A-State subjects showed a decrease 
in mean number of errors per problem over 
the course. The A-State performance data in 
the present study seems to support those 
earlier findings while opposing Campeau’s 
(1968) finding of superior performance 
for high-anxiety girls under the feed- 
back as opposed to the no-feedback con- 
dition. These results might be partially 
explained by the "response interference 
hypothesis" of the drive theory. Spence and 
Spence (1966) suggest that stress-induced 
anxiety results in an increase in drive and 
drive stimulus. The effect of increased drive 
stimulus is to elicit competing responses 
which may interfere with task performance. 
The Xenograde task can be characterized 
as a hypothesis formation-testing task. If 
the subjects' response pattern can be defined 
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in terms of a hypothesis formation-evalua- 
tion-rejection cycle, then individual hypo- 
theses about what constitutes an appropriate 
rule can be thought of as individual, covert, 
mediating responses to example displays, 
This then leads to overt attempts to solve 
the ensuing problems, with the attempted 
solution providing a basis for rejection or 
acceptance of the hypothesis in question, In 
such a situation, then, high A-State could 
result in the generation of a greater number 
of competing erroneous hypotheses. In order 
to test these hypotheses, the subject must 
utilize information available in the prob- 
lem displays or present in the form of feed- 
back. If we can assume a limit on the amount 
of information a subject can process—that 
is, channel capacity, as suggested by Miller 
(1967)—then it seems reasonable to specu- 
late that increased information input in the 
form of feedback could contribute to an in- 
crease in the proportion of incorrect hypo- 
theses, thereby producing a loss of efficiency - 
in the hypothesis formation-evaluation 
process. This could account for the greater 
number of errors per problem under the 
feedbaek condition for high A-State sub- | 
jects. ; 

To summarize, it seems plausible that in- 
ereased anxiety produces a greater number 
of competitive responses in the form of 
erroneous hypotheses. Given an upper limit 
on information-processing capacity, the 
subject must now evaluate a higher propor- 
tion of erroneous hypotheses. Feedback in- 
formation further adds to the information- 
processing burden, resulting in reduced 
efficiency and in an increase in the error 
problem ratio. Although speculative, such 
reasoning implies the need for further stud- 
ies in which the effects of high A-State on 
information-processing variables can be 
more carefully examined. An approach to 
such studies is suggested below. 


CONCLUSIONS 


Several tentative conclusions can a 
drawn from the results of this study regatc- 
ing the effects of feedback on A-State an 
the relationships among A-State, ability: 
and performance. With respect to the effec ‘i 
of information feedback on A-State, it aP 
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pears that real reductions in A-State ean be 
obtained through inereased use of feedback. 
Whether or not this results in higher levels 
of performance or improved learning de- 
pends on other factors, particularly on abil- 
ity factors that are known to be important 
to the task. Feedback seems to help persons 
with high reasoning ability while hindering 
the performance of those with low reasoning 
ability; this suggests a positive relationship 
between reasoning and information-process- 
ing capacities. 

While feedback generally seems to reduce 
A-State, high A-State appears to interfere 
with the learner's capacity to utilize the 
feedback information effectively in per- 
forming the task requirements. Learner con- 
trol, although defined here in a limited 
manner, also seems to offer definite advan- 
tages both in terms of anxiety reduction and 
performance. While the learner-controlled 
feedback was equally effective with the 
feedback condition in redueing anxiety, it 
resulted in a substantial reduction in the 
m of work required to complete the 
ask. 


SUGGESTED FURTHER RESEARCH 


The suggested relationships among the 
variables of A-State, information process- 
ing, and learning indicate a need for further 
research. Costello and Dunham (1971) have 
described a methodology, in the form of the 

approach model,” which offers promise for 
the investigation of the relationships be- 
tween two classes of variables, those relating 
to task performance and those relating to 
cognitive processes. The procedure embodied 
by the approach model typically involves 
the administration of tests of a mental abil- 
ity on which there is some general consensus 
of acceptance, such as induction or associa- 
tive memory. It also involves the adminis- 
tration of a learning problem—usually a 
concept-learning task—selected for its sus- 
pected ability requirements. The ability 
tests are then submitted to a “rational in- 
formation-processing analysis” and further 
tests are developed. These new tests are 
expected to be tests of the specific informa- 
tion-processing variables that are inherent 
in the ability tests. An example appropriate 
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to the reasoning ability factor might be hy- 
pothesis generation, hypothesis evaluation, 
or both of these. An analogous set of tests is 
developed from a rational information-proc- 
essing analysis of the task requirements. A 
factor analysis of the two sets of derived 
test scores will reveal, through common fac- 
tor loadings, factors that are inherent to 
both the task and the ability in question. 

The applicability of the approach model 
is limited to investigation of cognitive 
processes and therefore would not be of 
value in investigating the relationship be- 
tween cognitive and affective processes. It 
should, however, provide a sound methodol- 
ogy for investigation into the relationship 
between reasoning and information-process- 
ing variables. By introducing varied feed- 
back information content into the task as 
an independent variable, one might expect 
differential process requirement structures 
under different feedback conditions, with 
the relationship between process measures 
and performance measures also varying ac- 
cording to feedback condition. Higher feed- 
back may reduce the magnitude of the rela- 
tionship between performance and Process 
A while increasing the magnitude of the 
relationship between performance and Proc- 
ess B. The inclusion of A-State measures at 
various points in the learning task might 
provide a means of determining more pre- 
cisely the nature of the relationship between 
A-State and information-processing ability 
as a function of differential feedback. 
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A TEST OF THE THEORY OF FLUID AND CRYSTALLIZED 
INTELLIGENCE IN MIDDLE- AND LOW-SOCIOECONOMIC- 
STATUS CHILDREN: 

A CROSS-LAGGED PANEL ANALYSIS 


FRANK L. SCHMIDT anp WILLIAM D. CRANO* 
Michigan State University 


The prediction from the theory of fluid and crystallized intelligence 
that fluid ability (gr) operates as a cause of crystallized ability (ge) 
was tested by means of the cross-lagged correlation panel technique in 
samples of middle- (n = 3,944) and low-socioeconomic-status (n = 
1,501) elementary schoolchildren. Support for the postulated causal re- 
lationship was found in the middle-socioeconomic-status group, but re- 
sults for the low-socioeconomic-status group did not support the pre- 


diction. Possible explanations for this disparity were presented and 


discussed. 


The theory of fluid and crystallized intelli- 
gence, developed and elaborated primarily 
by Cattell (1971) and Horn (1968), holds 
that ‘there is not one but two general 
intelligences: fluid ability (gs) is concep- 
tualized as abstract, essentially nonver- 
bal, relatively culture-free mental efficiency, 
while crystallized mental ability (gc) is 
viewed as consisting primarily of acquired 
skills and knowledges and is thus strongly 
dependent, on cultural exposures. The g: is 
involved heavily in performance on such 
honverbal tests as figure classification, figu- 
ral analogies, and number and letter series 
tests and matrices. On the other hand, tests 
of vocabulary, general information, abstruse 
word analogies, and the mechanics of lan- 
guage are considered to be almost pure mea- 
sures of go. (Such tests as arithmetic reason- 
mg, inductive verbal reasoning, and 
Patto reasoning load about equally on 
the gr and ge factors.) The g. is viewed as the 

extent, to which one has appropriated the 
collective intelligence of his culture for his 
eya use [Horn & Cattell, 1967, p. 111)" 

le g; is said to reflect, to a much greater 


extent, basie neurophysiological efficiency- 


D f mones for reprints should be sent to William 
Biat rano, Department of Psychology, Michigan 
ate University, East Lansing, Michigan 48824. 


An impressive amount of empirical evidence 
has accumulated to support this distinction 
(Cattell, 1963, 1971; Horn & Cattell, 19662, 
1966b, 1967). 

This theory postulates that in the young 
child, gs stands in a causal relation to go: In 
addition to such nonintellective causes as 
educational opportunities and individual 
motivation, acquisition of ge depends on the 
level of gr (Cattell, 1971, pp. 98-100, 117- 
118; Horn, 1968)? However, the reverse 
causal relation is not à part of the theory; 
level of gr is considered to be unaffected by 


archical group-factor theories advanced by 
Vernon (1961), Burt (1949, 1955) and others 
do contain a pair of factors (verbal-educa- 
tional, and practical-mechanical-spatial) 
that are superficially similar to go and Et, 
they postulate no causal relations between 
the two? Similarly, neither Thurstone's 
(1938) nor Guilford’s (1967) theory contains 
any such causal propositions. Thus, empiri- 

2 Thus a moderate correlation (.40 to 50) would 
be expected, and is found, between the two general 


ability factors (Cattell, 1971, pp. 98-100). 
*For a description of the differences, see Hom 


(1968) and Cattell (1971, pp. 100-102). 


255 


256 


cal verification of the differential causal 
prediction would lend strong support to the 
Horn-Cattell (1966b) position. 

The purpose of the present study is to test 
this causal prediction separately in samples 
of middle- and low-socioeconomic-status 
elementary schoolchildren using the cross- 
lagged panel correlation technique (Camp- 
bell, 1963). 


METHOD 


Subjects 


Subjects. were 5,495 students enrolled in the 
public schools of Milwaukee who had been admin- 
istered the Lorge-Thorndike Intelligence Test 
(1957 version, Level 3) and the Iowa Tests of Basic 
Skills in Grade 4 and alternate forms of these same 
tests in Grade 6 during the academic years 1963- 
1964 and 1965-1966, respectively. Students from 
schools eligible for comprehensive programs of aid 
under Title 1 of the Elementary and Secondary 
Education Act for the 1967-1968 school year were 
classified as low socioeconomic status (n = 
1,501); students from ineligible schools were con- 
sidered essentially middle socioeconomic status. 


The gs and ge Measures 


The Lorge-Thorndike Intelligence Test (Lorge, 
Thorndike, & Hagan, 1966) contains both a Verbal 
and a Nonverbal scale. The Lorge-Thorndike 
Verbal scale is inappropriate to the purposes of 
this study because two of its five subtests tap 
abilities which can be expected to have a large 
loading on gr: Arithmetic Reasoning (Subtest 3) 
and Analogies Using Common Words (Subtest 5; 
Cattell, 1971, p. 97 ff.; Horn, 1968; Horn & Cattell, 
1967). The Lorge-Thorndike Verbal scores are best 
considered composites of ge and gr.* The Lorge- 
Thorndike Nonverbal, on the other hand, is ap- 
parently a nearly pure measure of gr. Its three 
subtests—Figure Analogies, Figure Classification, 
and Number Series—are entirely pictorial, dia- 
grammatic, or numerical in nature. Each of these 
kinds of measures has been reported as loading 
heavily on gr and negligibly or not at all on ge 
(Cattell, 1971, p. 97 ff.; Horn, 1968; Horn & Cat- 
tell, 1967). Additional evidence for the construct. 
validity of Lorge-Thorndike Nonverbal as a meas- 
ure of gr is provided in the Lorge et al. (1966) 
manual: In two separate samples in which the 
Lorge-Thorndike Nonverbal was correlated with 
the tests in the Differential Aptitude Tests, its 
highest correlations (.76 and .63) were with the 
Abstract Reasoning scale, a nonverbal test which 
fits Cattell’s (1971, pp. 98-99) description of the 
ideal g; measure almost perfectly. 


* Scores are not reported separately for subscales 
within the Verbal and Nonverbal scales. 
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The Iowa Test of Basic Skills (Lin 
Hieronymus, 1964) consists of five subse 

1. Vocabulary; 

2. Reading Comprehension; j 

3. Language, which is made up of the 
scales Spelling, Capitalization, Puni 
and Language Usage; 

4. Work-Study Skills, which includes 
subscales Map Reading, Graph and 
Reading, and Knowledge and Use of R 
ences; and 
5. Arithmetic, which consists of the 

scales Arithmetic Concepts and Arit 
Problem Solving. * 

Vocabulary, Reading Comprehension, § 
Capitalization, Punctuation, Language | 
(detection of grammatical errors), and Knowl 
and Use of References are clearly meas 
and were employed as such in this study, 
Reading and Graph and Table Reading pi 
load on general visualization ability (gy) 
is considered as a separate factor in the theo 
gr and g.." On initial consideration, the Ari 
Concepts subscale, which assesses mastery 
concepts involving the number system, de 
ratios, etc., appears to be an almost pi e 
of go, but examination of individual items 
measure revealed that many involve pro 
solving and general reasoning, making i 
that this scale taps the g: factor to some 
Available evidence (Cattell, 1971, p. 97 
1968; Horn & Cattell, 1967) indicates that 
metic reasoning (solution of word problems) 
about equally on gr and gs and thus the Ari 
Problem-Solving subscale was also elimin 
a ge measure, leaving seven relatively pure | 
ures of ge. 


Analysis 


The analytic procedure employed in this £ 
was the cross-lagged panel correlational tech 
(Campbell, 1963; Campbell & Stanley, 
Pelz & Andrews, 1964). When one has a 
correlational information relating two 
at two or more points in time, this teci 
allow the inference of causality. This i n 
based on the supposition of all sciences that 
a given event (Event 1) consistently pre 

occurrence of another (Event 2), but the 0 
does not hold, only two conclusions are 
(a) Event 1 is a cause (possibly only one 
of Event 2 or (b) both Events 1 and 2 are th 
of some more general cause or causes. In 
mental designs, control of the presentati. 

independent variable is seen to rule out the 
alternative, and thus, response differen! 


*For a fuller description of each of 
scales see Lindquist and Hieronymus (1964) 

* Factors regarded by the theory as 8e 
(and much less important than) gt 
general visualization (gv), general spe 
carefulness (C), and fluency (F). 
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Ficure 1. Schematic representation of associated 
relationships in a cross-lagged panel analysis inves- 
tigating fluid (f) and crystallized (c) intelligence 
(G). 


tween experimental and control subjects are taken 
as effects of the experimental manipulation or 
treatment. Correlational studies, of course, do not 
typically entail the controlled application of an 
independent manipulation. Both variables are, in 
effect, dependent measures. Hence, causal infer- 
ence on the basis of correlational results is unsup- 
portable, given the possibility of the second alter- 
native noted above. Consider, however, the pat- 
tern of results obtained in a simple cross-lagged 
panel analysis (Figure 1). For purposes of later 
exposition, the two hypothetical variables em- 
ployed will consist of fluid (f) and crystallized (c) 
intelligence measured at two time periods. The 
two synchronous correlations (rati ru) and the 
lagged autocorrelations (reyes) Tni) provide in- 
formation concerning the relationship between the 
two constructs, and the temporal stability of the 
tests employed in the measurement of these con- 
structs, 
; More central from the standpoint of causal in- 
erence are the correlations crossed and lagged 
Mal time (i.e., ron, T6). Suppose, for example, 
at status on gr at Time 1 was followed consist- 
quy by a similar status in ge at Time 2 but that 
b e opposite did not hold. In that case, The: > 
du and on the basis of the time-precedence no- 
i d of causality discussed above, one could argue 
lii & was a cause of later ge- If fan > Thes Of 
i rse, the opposite causal inference woul 
E eored. In the null case in which Tro, = Teita 
i er no causal relationship exists or both vari- 
übles are the effect of some hidden cause (see 
tano & Brewer, 1973). 
^ In the present study, the prediction is that with 
pepenate statistical corrections for reliability 
1972) communality changes (see Crano; et al, 
(e. the crossed and lagged correlation ftos 
*., the correlation between gr in Grade 4 ani 
E: in Grade 6) will be significantly larger than 
ut (the correlation between ge in Grade 4 and 
&t in Grade 6) for each of the seven measures of 
fe If the predominant direction of causation is 
e m gr to ge rather than vice versa, it would be 
pected that the correspondence between relative 
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standing on gr in Grade 4 and g, in Grade 6 would 
be greater than the correspondence between ge in 
Grade 4 and gr in Grade 6. Confirmation of this 
hypothesis would not eliminate the possibility ` 
that g. to some extent influences gr, but it would 
indicate that the preponderance of causation is in 
the direction of gr causing ge.’ Another point to be 
borne in mind is that, as noted above, gr is postu- 
lated as one of a number of causes of ge. Thus, con- 
firmation of the present hypothesis would provide 
evidence about only one ge determinant. 

As with most powerful methodologies, the 
cross-lagged panel correlational technique is not 
without its technical problems. One of the most 
difficult of these comes about as a result of the 
time lag which occurs between the administrations 
of the tests of interest. Over the course of time, it 
is conceivable that test characteristics (e.g., com- 
mon factor structure and internal consistency) 
might change and, further, that these changes 
would indicate a significant discontinuity in the 
cross-lagged correlations even in the absence of a 
true causal relationship. For example, suppose 
that the reliability of the gr measure increased (or 
its specificity decreased) over time, while the 
reliability of the ge measure decreased (or its 
specificity increased). The result would be an 
increase in Tet and a decrease in 744; these effects 
might be large enough to indicate a significant 
disparity between the cross-lagged correlations in 
which no causal relationship existed; such effects 
could be of a magnitude large enough to reverse 
the true direction of causality. 

In attempting to offset this problem, one could 
adopt a factor analytic approach that allows for 
variation in the unique factor loading of each 
variable (over time) but assumes that all orthogo- 
nal common factor loadings are invariant or that 
they change by some multiplicative constant. If 
this assumption is met, then the common factor 
structure of the tests over time would be un- 
changed, although there might well be changes in 
communality and, therefore, uniqueness. Having 
corrected for the effects of this change constant on 
the cross-lagged correlations, one could inspect 
the resulting causal discontinuities without undue 
concern for the potentially biasing effects of un- 
reliability and factor structure changes. - 

To test the viability of this assumption, all 
synchronous Time 1 correlations (e.g. Tf) could 
be divided by their respective Time 2 correlations 
(rus). If the assumption of a multiplicative con- 
stant change in common factor structure 18 valid, 

i i ratios will be single 
bounds of sampling error, 
such a criterion was met in this research. The 
correction procedure (the specifies of which are 
detailed in Crano, Kenny, & Campbell, 1972, pp- 
ee 

7 A fully adequate test of the theory of gt and ge 
would have to be capable of eliminating this possi- 


bility, since the apparent. prediction from the the- 


ory is that ge cannot operate causally with respect 
to gr- 
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FLUID AND CRYSTALLIZED INTELLIGENCE 


967-270; Kenny, 1973) was employed in modifying 
all cross-lagged correlations reported below. The 
general effect of this procedure is the attenuation 
of cross-lagged differences; thus, the results to 
follow, if inaccurate, will be biased in a conserva- 


tive direction. 
RESULTS AND Discussion 


The intercorrelations among ge and gi 
measures in Grades 4 and 6 for both socio- 
economic-status groups are presented in Ta- 
ble 1, along with the means and standard 
deviations of each test at each time period. 
A comparison of the mean ge and gs subscale 
scores of the two groups revealed the exist- 
ence of significant differences on every meas- 
ure, with scores invariably favoring the mid- 
dle-socioeconomic-status sample (all ¢ differ- 
ences were significant at p <  .0001). 
Comparisons of test variance between the 
groups indicated significantly greater varia- 
bility of test scores within the middle-socio- 
economic-status group on each ge measure 
(all F values were significant at p < .01); in 
comparisons involving the gr measure, how- 
ever, no significant differences in test varia- 
bility between the groups were encountered. 

When the correlations between all mea- 
sures were converted to Fisher’s Z scores and 
were averaged and reconverted, it was found 
that the average correlation between the gt 
measure (Lorge Thorndike Nonverbal) and 
the seven ge measures was .497 in Grade 4 
and .560 in Grade 6. These correlations are 
within the range predicted by the theory 
(Cattell, 1971, p. 99; Horn, 1968). The av- 
erage correlation among the seven g. meas- 
ures was .580 in Grade 4 and .670 in Grade 6. 
The various crystallized measures apparently 
assess somewhat different components of the 
(partly) culturally determined ge composite 
and, thus, intercorrelate moderately but not 
highly with each other (cf. Horn, 1968). 

The corrected cross-lagged correlations be- 
tween the gr and ge measures and their asso- 
ciated ¢ values are presented in Table 28 


IIoc Ame 


* The ¢ test employed in this investigation was 
based on a correction to the usual test of differ- 
ences between correlations, suggested originally by 
Pearson and Filon (1898). In this analytic proce- 
dure, the indirect correlation between the arrays 
Under comparison are modified by the four other 
ie values (see also Peters & Van Voorhis, 
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TABLE 2 
CORRECTED CnRoss-LagGED CORRELATIONS 
BETWEEN FLUID ABILITY (Gr) AND 
CRYSTALLIZED ABILITY (Gc) 
Measures For MIDDLE- 
AND Low-Sociorconomic- 
Stratus SAMPLES 


Socioeconomic status 
Middl 
ORIS oo | «eio 
r t r t 

LT-NV — Vocabulary |.4844 4090) 
Vocabulary — LT-NV |.4567|2-182").4290| — -580 
LT-NV — Reading 

Comprehension .4863| .3147 
Reading Comprehen- 2.248* —1.50 

sion — LT-NV .4531 .3586| 
LT-NV — Spelling 3963 3931 
Spelling > LT-NV (3814-099 |'a1a7| — 828 
LT-NV — Capitaliza- 

tion 4900] 4476 
Capitalization > 2.569* —.107 

LT-NV .4539 . 4676} 
LT-NV — Punctuation|.4478 .4073 
Panetuation -» LT-NV|.4212]-427. 185—497 
LT-NV — Usage 4651 44195| _ 
Usage — LT-NV Agi M0 4283| — 308 
LT-NV — References |.5355 .4465| 
References > LT-NV “4660/0: 051" .4284 A 


Note. Abbreviation; LT-NV = Lorge-Thorn- 
dike Nonverbal scale. 
*p < 05. 


These findings indicate rather strong support 
for the theory of grand g. in the middle-so- 
cioeconomic-status sample. In every case, for 
example, discrepancies between cross-lagged 
correlations were in the direction predicted, 
and in five of seven comparisons, these dis- 
crepancies attained statistical significance 
(in the remaining instances, one comparison 
attained the p < 10 level of significance). 
Since the theory recognizes numerous causes 
of ge in addition to gr, the pattern of causal 
determination presented in Table 2 is espe- 
cially noteworthy. The fact that none of the 
other leading theories of intelligence (e.g., 
Burt, 1949; Guilford, 1967 ; Thurstone, 1938; 
Vernon, 1961) predict the obtained pattern 
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of causal relations lends particular value to 
the data of this investigation. In attempting 
to weigh the merits of one of these theories 
against the others, researchers in the future 
must come to grips with this rather com- 
pelling evidence in favor of the Cattell-Horn 
formulation. 

A final observation on the findings of this 
investigation concerns the results obtained 
among the lower-socioeconomic-status group. 
In this subsample, the theory finds no sup- 
port whatsoever. All but one of the obtained 
causal relationships are in the direction op- 
posite to that predicted, and the single con- 
firmatory finding (that for the Knowledge 
and Use of References subtest) did not reach 
significance.? 

Why gr should show a strong tendency to 
act causally with respect to ge among middle- 
but not lower-socioeconomic-status school- 
children is not immediately obvious; cer- 
tainly nothing in the theory predicts this 
outcome. It is possible, however, that the 
postulated causal mechanism operates only 
in the presence of a certain previously exist- 
ing minimum level of ge, with the acquisition 
of this minimum g. level depending almost 
solely on eultural and educational exposure 
opportunities (see Crano, 1973). In the pres- 
ent research, as in previous studies, large and 
significant differences in test scores were 
found favoring the middle-socioeconomic- 
status group; these differences were found at 
both grade levels and on all measures. It is 
therefore conceivable that the great bulk of 
the lower-socioeconomic-status sample had 
not acquired the minimum levels of com- 
petency necessary in the seven g. areas 
assessed to activate the gy > E. causal 
mechanism. The relative depression of the 


" The four tests excluded from the analysis show 
the same pattern of results. All differences are in 
the predicted direction for the middle-socioeco- 
nomic-status sample and in the opposite direction 
for the low-socioeconomic-status group. For Map 
Reading, Graph and Table Reading, Arithmetic 
Concepts, and Arithmetic Problems, t values were 
2.248, 2.254, 1.105, and 1.404, respectively, in the 
middle-socioeconomic-status group, and —1.500, 
—.850, —.055, and —3.060, respectively, in the low- 
socio-economic-status group. However, because of 
possible contamination by factors other than Be, 
results for these measures should be interpreted 
with caution, if at all. 
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means and variances of the g, measures in 
the lower-socioeconomic-status sample lends 
support to the plausibility of this specula. 
tion. This hypothesis could be further tested 
by comparing the patterns of causality ob. 
tained in different g. score ranges within 
low- and middle-socioeconomic-status groups, 

An alternative explanation can perhaps be 
found in the area of motivational differences, 
The gr is postulated as a necessary but not 
sufficient cause for g., as is individual moti- | 
vation (Horn & Cattell, 1967). If individual 
motivation to acquire g, abilities is relatively 
uniform (and moderately high) in middle. 
socioeconomic-status children, the role of 
motivation in creating individual differences 
in g, would be expected to be small relative 
to that of gr. Conversely, if individual mo: | 
tivational levels were more variable in the 
lower-socioeconomic-status sample, the rela- | 
tive causal influence of gr on g, would be ex- 
pected to be smaller and that of motivation 
larger. 

This motivation-based proposition would 
appear to call for greater g. variability in 
the lower-socioeconomic-status group, & 
finding opposite to that obtained in this re 
search. It is conceivable, however, that mo- 
tivational differences would affect ge variance 
only after some minimal threshold achieve: | 
ment level is surpassed. On the basis of this | 
reasoning, we would assume that even high | 
levels of motivation will have little implica- | 
tion for achievement among children of very | 
low ge. In future studies, the motivational- | 
differences hypothesis can be tested by cross | 
lagging measures of motivation with ge mea- | 
sures in lower- and middle-socioeconomle 
status groups with the prediction that mo i 
tivation would exert the strongest causi 
influence on g. in lower-socioeconomic-status 
groups of  moderate-to-high academie 
achievement. ; 

In summary, the present study provides 
strong support for the proposition, deriv 
from the Cattell-Horn theory of intelligente 
that g;is related in a causal manner to £o ? 
least in middle-socioeconomic-status du 
The data from the Iow-socioeconomie sta 
sample, however, did not support this pre " 
tion: gr apparently has little implication 
£. among children of low socioeconomic 89" | 


| 
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tus (and low levels of achievement). Possible 
avenues for exploring this discrepancy have 
been proposed and will hopefully lead to a 
more complete understanding of the mecha- 
nisms which underly these apparent differ- 
ences in the causal factors influencing aca- 
demic and intellectual development. 
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TRAINING IMAGERY PRODUCTION IN YOUNG CHILDRI 
THROUGH MOTOR INVOLVEMENT' 


WILLIAM H. VARLEY; JOEL R. LEVIN,’ ROGER A. SEVERSON, 
AND PETER WOLFF 


University of Wisconsin—Madison 


Kindergarten and first-grade children were given a paired-associate 
learning task following one of five types of Strategy-training proce- 
dures. In the motor-training conditions, subjects generated inter- 
actions involving pairs of toys by playing with them or by drawing 
pictures of them. It was found that relative to simple imagery practice, 
motor training facilitated the performance of kindergartners, with no 
differences among four motor-training variations. In the first. grade, 
imagery practice by itself was as effective as each of the motor-training 
procedures. The results were discussed in terms of Piaget’s theory of 
cognitive development and were contrasted with previously unsuccess- 
ful attempts to induce self-generated elaboration strategies in young 


children. 


It is well established that paired-associ- 
ate learning is facilitated by the addition of 
experimenter-provided elaborations (Roh- 
wer, 1967). However, this is not always true 
when young children (typically below eight 
years of age) are asked to generate such 
elaborations themselves (Levin, 1972; Roh- 
wer, 1972; but see McCabe, Levin, & Wolff, 
in press). Given that young children are un- 
successful in generating sentence and im- 
agery elaboration strategies on request, the 
possibility exists that with systematic in- 
struction or training they may be taught to 
do so. However, initial efforts to train mne- 
monic elaboration in young children have 
been disappointing (Rohwer & Ammon, 
1971; Rohwer, Ammon, & Levin, 1971), 
leading Rohwer (1972) to conclude that 


*The research reported here was done as part 
of the first author’s doctoral dissertation, with 
support from the Wisconsin Research and Develop- 
ment Center for Cognitive Learning at the Uni- 
versity of Wisconsin, Madison. We are grateful 
to Billie Albrecht for typing the final draft of the 
paper. 

* Now at Dartmouth-Hiteheock Mental Health 
Center, Hanover, New Hampshire. 

* Requests for reprints should be sent to Joel R. 
Levin, Wisconsin Research and Development 
Center, 1025 West Johnson Street, Madison, Wis- 
consin 53706. 
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children must attain a certain age or n 
rational level before they can be taug 
use elaboration strategies effectively. 

Considering the above failures, su 

conclusion is tempting. On the other I 
it has been recently demonstrated 
(supposedly) “pre-imagery” children 
be induced to generate facilitative i 
elaborations through concurrent moto 
volvement (Wolff & Levin, 1972). In 
experiment, it was found that although 
and seven-year olds did not benefit - 
instructions to imagine an interaction 
tween pairs of toys, when they were p 
ted to play with toys concurrently, . 
performance improved dramatically ( 
though they were not allowed to see 
manipulations they generated). This 1 
was interpreted as being consistent wit 
aget's theory of cognitive developmen 
which it is assumed that the preopera t 
child cannot produce dynamic visual re 
sentations (internally) without mo 
tivity involving the events to be represt 
(Piaget, 1962). 

Danner and Taylor (1973) were & 
to induce imagery in first graders by 
them pretraining in drawing pict re 
separate objects interacting. Although 


IMAGERY PRODUCTION 


investigators did not interpret their findings 
in terms of motor involvement, Piaget and 
Inhelder (1971) would view playing and 
drawing activities similarly, with each re- 
sulting in internalized imitation (visual im- 
agery). Thus, the important distinction be- 
tween the Danner-Taylor and Wolff-Levin 
experiments is that in the former, the motor 
activity (ie. the pretraining) was tempo- 
rally removed from the criterion perform- 
ance (and involved different items), 
whereas in the latter, the motor activity 
and criterion performance were concurrent. 

The purpose of the present study was 
twofold: (a) to investigate differences in 
performance of younger and older children 
within the imagery-transition stage and (b) 
to extend the motor-imagery separation re- 
sults to other types of training procedures. 


METHOD 


Subjects 


Eighty children apiece from the kindergarten 
and first grade of a small, middle-class community 
in southern Wisconsin served as the subjects“ All 
testing was done during the spring of 1972. At 
that time, the median age of the kindergartners was 
6 years, 1 month, and that of the first graders was 
7 years, 1 month. The subjects were taken in turn 
from their classrooms amd randomly assigned (in 
equal numbers) to one of five treatment conditions 
at the time they entered the experimental room. 


Design and Materials 


Following Wolff and Levin (1972), a paired- 
associate task was constructed with small children's 
loys comprising the stimulus materials. Fifteen 
pairs were created for the learning task, with an 
additional eight pairs used for practice and exam- 
ples during training. All pairs could be easily 
labeled by kindergarten and first-grade children, as 
determined from an earlier pilot study. Each pair 
(e, cowboy-car) was formed randomly, subject 
to the constraint that a plausible interaction 
existed for the two paired toys. 

„Prior to the learning task, the subjects were 
given one of five types of strategy training. In the 
imagery control condition, the experimenter dem- 
Onstrated a. predetermined interaction for each of 
the first four training pairs, after which the sub- 
lect was given instructions to imagine an inter- 
Action for each of the four remaining pairs. This 
nonmotor-imagery condition was incorporated as 
a baseline for evaluating the effects of four motor- 
imagery training procedures in each grade, as fol- 
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lows: In two conditions, the subjects were instruc- 
ted to generate interactions for each of the 
eight training pairs, either by playing with the toys 
(repeated play) or by drawing a picture of them in 
interaction (repeated draw). These conditions ap- 
proximated those used by Wolff and Levin (1972) 
and Danner and Taylor (1973), respectively. 
Finally, in another two conditions, the subjects 
were given playing (or drawing) practice on the 
first four training pairs, followed by a delayed play 
(or draw) instruction for the remaining four pairs 
in which the subjects were required to indicate that 
they had generated an imaginal interaction before 
they were permitted to execute it through playing 
with (or drawing) the toys. It was assumed that 
these latter two conditions—faded play and faded 
draw—would produce better subsequent perfor- 
mance than their repeated counterparts (especially 
for kindergartners), since the subjects would be 
receiving practice in generating imagery apart 
from motor activity. 


Procedure 


In order to monitor the elaboration of the sub- 
jects during training, interactions created by the 
subject or the experimenter were briefly described 
by the experimenter without labeling the toys 
(eg, “Look, it's chasing that.”). The subject’s 
verbalizations regarding his interactions were sup- 
pressed by the experimenter during training so that. 
facilitative effects of subject-generated verbaliza- 
tion would not cloud an imagery interpretation of 
the training effects. It was difficult to equate the 
amount of time across training conditions. Al- 
together, the subject. spent about 25 minutes with 
the experimenter in the imagery control, repeated 
play, and faded play conditions, and about 35 
minutes in the repeated draw and faded draw 
conditions. 

The learning task itself was presented by way 
of an incidental-learning format because it was 
thought to be less reactive in this kind of training 
experiment, and since no differences in performing 
the task as a function of intention to learn have 
been found with subjects of this age (cf. Wolff, 
Levin, & Longobardi, in press). The subject’s in- 
troduction to the learning task was the same across 
all conditions, but the “set” given to each sub- 
ject was based on the training he had received.” $ 

All subjects were tested individually. Following 
the 8 training items, the 15 learning task pairs 
were presented one at a time at a 10-second rate 
(timed by the experimenter). After the last pair 


‘Thanks are due to the staff and students of 
the Sauk Prairie Public Schools in Sauk Prairie, 
Wisconsin. j iin 

"The instructions given to the children within 
the various conditions, and a list of the stimulus 
materials employed, are available from the second 


author on request. 
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was presented, the 15 response toys were displayed, 
and for each stimulus toy (presented in a different. 
random order than the study items), the subject 
was required to pick up the response toy with 
which it was initially paired. The subjects were 
given 7 seconds to respond to each stimulus toy. 
Each of the subject’s selections was replaced in 
the array before the next stimulus toy was 
presented. 


RESULTS 


Learning was defined as the total number 
of correct responses (out of 15) during the 
recognition test. The mean number of cor- 
rect responses, by condition and grade, is 
presented in Figure 1. In the analysis of 
variance, comparisons were nested within 
grades so that statements about the effec- 
tiveness of the different conditions could be 
made separately for each grade. To do this, 
each motor-training condition was com- 
pared with the imagery control condition 
via Dunnett’s test (one-tailed, « = .05). In 
addition, the factorial combination of the 
nature (play or draw) and extent (repeated 
or faded) of motor involvement yielded 
three orthogonal comparisons among train- 
ing procedures within each grade. 

Within the kindergarten sample, Dun- 


No. Correct 


Kindergarten 
Fiaure 1. Mean number of correct responses by grade and experimental condition. 
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cantly better learning than did the imagery 
control condition. However, within the f st- 


significant differences among the four mo. 
tor-training conditions were detected in | 
either grade (all ps > .05). | 

A post hoc breakdown of these effects 
yielded further interesting informati 
When the data were analyzed separa 
for boys and girls, it was found that 
comparability of the imagery control 
motor-training conditions in the first gr 
was true only for girls (|t| « 1); for bo 
however, significant differences in favor of 
motor training still existed (t = 1.86, df= 
29, p « .05, one-tailed) as they had for 
both sexes in the kindergarten sample 
(girls, t = 2.96, df = 32, p < .01; boys, t= 
2.57, df = 38, p < .025). Despite th 
different effects for boys and girls, the p 
vious finding of no significant differen 
among the four motor-training conditi 
did not change when investigated for eati 
sex group separately. 


First Grade 
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DISCUSSION 


The results of the present study indicated 
that for kindergarten children, each motor- 
training condition resulted in significantly 
better learning than the nonmotor-imagery 
control condition. However, the effect due 
to motor training was not observed in the 
first-grade sample where motor and nonmo- 
tor training resulted in nonsignificant 
learning differences. It might therefore be 
inferred that between the ages of six and 
seven the young child is becoming increas- 
ingy adept at generating dynamic repre- 
sentations in the absence of concurrent mo- 
tor involvement or without immediate prior 

training. This age range may well be ad- 
justed depending on the sex of the child 
(with the present data suggesting that the 
ability develops earlier in girls than in 
boys) as well as on the particular sociocul- 
tural characteristics of the population being 
considered. 

The finding that playing and drawing 
training did not differ statistically is consist- 
ent with Piaget and Inhelder’s (1971) view 
that both represent vehicles for visually in- 
ternalizing external stimuli. The failure of 
the faded training conditions to produce 
learning superior to the repeated conditions 
was unexpected. However, it was noted by 
the experimenter that the subjects in the re- 
peated conditions typically hesitated before 
either playing with or drawing the toys for 
each training pair. This observation sug- 
gests that the subjects in these conditions 
were thinking up (imagining?) interactions 
before they executed the motor activity, 
even though they were not explicitly in- 
structed to do so as in the faded conditions. 
Such an observation, of course, also hints at 
additional research (utilizing response la- 
tencies, delayed visual feedback, motor 
blockage, and other variables) to specify 
more precisely the temporal antecedents of 
imagery production in young children. 

The results of this study have implica- 
tions for future efforts to train imagery 
elaboration in children of this age. They 
strongly support the assumption that proce- 
dures designed to train cognitive skills must 
take into account the information-process- 
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ing skills of the young child as postulated 
by developmental learning theories. The 
present training procedures, based on Pi- 
aget’s theory of cognitive development, in- 
dicate that motor involvement should be a 
dominant feature of imagery training in 
young children, an approach not fully ex- 
ploited by previously unsuccessful training 
studies (Rohwer & Ammon, 1971; Rohwer, 
Ammon, & Levin, 1971). 

Tt should be noted that the learning ma- 
terials of the present study consisted of 
toys, which may be regarded as being more 
concrete (in Paivio's, 1971, terms) than the 
pietures and aurally presented words of 
Rohwer's (1967) earlier work and which 
may be assumed to evoke dynamic images 
more readily (Levin, 1972). However, per- 
haps one of the most exciting results here 
was the marked facilitative effects of the 
drawing-training conditions, substantiating 
those of Danner and Taylor (1973). A 
worthwhile direction for future training 
studies with young children would be to 
to the learning of 
chil- 


or aural materials while honoring the need 
for motor involvement in the training proc- 
ess. The fact that the playing and drawing 
conditions resulted in equal degrees of 
facilitation here strengthens this conclusion. 
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SOCIAL-EMOTIONAL, COGNITIVE, AND DEMOGRAPHIC 
DETERMINANTS OF POOR SCHOOL ACHIEVEMENT: 
IMPLICATIONS FOR A STRATEGY OF INTERVENTION’ 
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New York, New York 


The effect of three classes of variables (preschool cognitive function- 


ing, preschool social-emotional 


functioning and background-demo- 


graphic variables) on early elementary school achievement was ex- 
amined. Two hundred and nine black and white boys from lower- and 
middle-class backgrounds were evaluated during the preschool period 
and received achievement, tests during the second year of elementary 


school. Each of the three classes of 


variables accounted for a significant 


proportion of the variance of the criterion measures. When the classes 
of variables were examined by means of hierarchical regression tech- 
nique, the social-emotional and cognitive variables yielded the most 
information for programs of psychological intervention. Intervention 
directed at the social-emotional components of cognitive performance 


was discussed. 


fuut a 


Social scientists have been increasingly 
concerned with the antecedents of school 
success. Attempts at early cognitive reme- 
diation, such as some Head Start and Home 
Study programs, have been based on the 
assumption that disadvantaged home envi- 
ronments fail to provide the early learning 
experiences needed to cope with the intel- 
lectual demands of school. These interven- 
tion programs have focused primarily on 
the remediation of cognitive handicaps and 
have paid little attention to the relationship 
between social-emotional functioning and 
learning difficulties and underachievement. 


1This research was carried out under Grant 
16944 from the National Institute of Mental 
Health to the first author. 

The authors gratefully acknowledge the excel- 
lent cooperation extended by the Division of Day 
Care of the New York City Department of Social 
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and the New York City Board of Education. The 
authors are particularly indebted to their statistical 
consultant, Jacob Cohen, for the application of 
his innovative statistical technique to the problems 
and issues set forth in this study. 

? Requests for reprints should be sent to Martin 
Kohn, William Alanson White Institute, 20 West 
74th Street, New York, New York 10023. 


Wechsler (1971) has suggested that intel- 
lectual achievement is “dependent, to vary- 
ing degrees, upon a variety of determinants 
which are more of the nature of connative 
or personality traits rather than of cogni- 
tive abilities [p. 51].” These observations 
are strongly confirmed in the clinical litera- 
ture on emotional disturbance and learning 
difficulties (Pearson, 1952, 1954) ; system- 
atic evidence is sparse. 

Kohn and Rosman (1972a) have devel- 
oped a two-factor model of the child’s so- 
cial-emotional functioning in the preschool 
setting as follows: Factor ]—Interest-Par- 
tieipation versus Apathy-Withdrawal; Fac- 
tor TI— Cooperation-Compliance versus 
Anger-Defiance. They have established that 
these dimensions (a) assess relatively en- 
during attributes of children across settings 
and over time (Kohn & Rosman, 1972a, 
1973b), (b) are valid measures of emo- 
tional health-disturbance (Kohn & Rosman, 
1973c), and (c) are relevant to preschool 
cognitive functioning and to later school 
achievement Kohn Rosman, 1972b, 
19732). Apathy-Withdrawal, but not An- 
ger-Defiance was predictive of poor achieve- 
ment (Kohn & Rosman, 1972b). More 
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recently, a third factor dimension of social- 
emotional functioning (Task Orientation) 
was found to be highly correlated with cog- 
nitive functioning at the preschool level 
(Kohn & Rosman, 1973a). 

The purpose of the present study was 
two-fold: 

1. To determine the extent to which early 
elementary school achievement is a func- 
tion of each of three classes of variables: 
preschool social-emotional functioning, 
preschool cognitive functioning, and back- 
ground-demographie variables. It was hy- 
pothesized that low achievement is pre- 
dicted by social-emotional disturbance, by 

cognitive impairment, and by low status on 
a series of background-demographic varia- 
bles related to social advantage-disadvan- 
tage. Since the cognitive variables were in 
the same domain as the criterion measures, 
they were expected to be the best predic- 
tors. 

2. To generate information expected to 
be useful in remediation and intervention 
programs by examining the joint effect of 
the three classes of variables on second- 
grade school achievement. Basie to the 
study was the assumption that deficits and 
disturbances in preschool social-emotional 
and cognitive functioning would be more 
remediable, at least by psychologists, than 
disadvantaged status as reflected in the 
background variables. The following ques- 
tions were posed: 

1. To what extent do the two classes of 
psychological variables (ie., the social- 
emotional and cognitive variables) predict 
over and above their separate effects? 

2. In what sequence do these two classes 
of variables yield the maximum amount of 
information, that is, when social-emotional 
variables are examined over and above the 
variance accounted for by the cognitive 
variables or vice versa? 

3. To what extent are the background- 
demographic variables predictive of second- 
grade school achievement after the psycho- 
logical measures have been partialed out? 

The individual variables within each 
class and their relevant hypotheses appear 
in the next section. 
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METHOD 


Subjects ' 
Subjects were 209 second-grade boys 
two years earlier when they were enro 
York City public school kindergartens. 
garten is referred to as “preschool” simpl 
cate that it is the period prior to the onset, 
education and that emotional impairment 
this period is unlikely to be due to aca 
ure. Originally, 217 children were include 
study; vigorous follow-up efforts result 
location of 96.4% of the sample in public, 
and parochial schools in the metropolitan 
urban New York area as well as in other 8 
Subjects had been selected, in appro 
equal numbers, from three social class le 
on the Hollingshead Two-Factor Index 
Position‘; ranked from low to high, these I 
Class V, IV, and III plus (III or better) 
the sample was white and half black; bla 
white subjects were distributed almost. 


riod appear in Kohn and Rosman, 1978a. 

The predictor variables (social-emotioi 
tioning, cognitive functioning, and backgi 
demographie data) were collected during 
school period while subjects were attendin 
garten. Their mean age at the time was 
(5 years 8 months). The criterion. 
(achievement measures) were collected 
children were in second grade; mean 
time was 88 months (7 years 4 months). 


Measures of Social-Emotional Fi 


Instruments. Social-emotional functio 
assessed by means of three instrumenti 
Social Competence Scale, Kohn Problem € 
(Kohn & Rosman 19722), and Schaefer 
Behavior Inventory.’ The two Kohn s 
measure two major dimensions of social-et 
functioning: (a) Factor I—Interest- 
versus Apathy-Withdrawal and (b) Fi 
Cooperation-Compliance versus Anger: 

Factor I indicates the extent to whi 
displays interest and curiosity, utilizes 
tunities of the classroom, and interacts 


*Included in the original sample we 
dren from New York City day care cente 

* A. B. Hollingshead. Two-factor inde 
position. Unpublished manuscript, 1957. 
from A. B. Hollingshead, Department of. 
Yale University, New Haven, Connec! 

* E. S. Schaefer and M. R. Aaronson. 
behavior inventory: Preschool to p! 
published manuscript, 1966. (Available 
Schaefer, Department of Maternal B 
Child Care, Duke University, D 
Carolina 27701.) 
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with his peers or, on the other hand, to what ex- 
ient he withdraws and displays little interest in 
his peers or in kindergarten activities. High rat- 
ings on Apathy-Withdrawal had been associated 
with poor cognitive functioning at the preschool 
level (Kohn & Rosman, 19732); in the present 
study it was hypothesized that children high on 
Apathy-Withdrawal would continue to perform 
more poorly in elementary school than those high 
on Interest-Participation. 

Factor II indicates the extent to which the child 
can accept the routines of a classroom and comply 
with teachers’ requests or suggestions or, on the 
other hand, to what extent he is defiant and creates 
disturbances which disrupt normal classroom rou- 
tines. Only minimally related to preschool cogni- 
tive functioning, this factor was expected to be 
nonpredictive of the second-grade achievement 
criteria. 

The Schaefer Inventory yields three major 
dimensions: (a) Factor I—Extroversion versus 
ntroversion, (b) Factor ]I—Adjustment versus 
Maladjustment, and (c) Factor III—High versus 
Low Task Orientation. 

In previous work (Kohn & Rosman, 1972a), 

the Schaefer Factor I and II dimensions were 
found to be congruent and substantially correlated 
with the Kohn Factors I and II. Schaefer Factor 
II, Task Orientation, which indicates the extent 
to which the child is able to concentrate, become 
absorbed, and persevere in activities or, on the 
other hand, to what extent he is hyperactive and 
distractible, has a low attention span, and shows 
ow motivation to “stick to” tasks, has no equiva- 
lent on the Kohn scales. Substantially related to 
preschool cognitive functioning (Kohn & Rosman, 
1973a), this variable was expected to be one of the 
better predictors of second-grade achievement. 
, Procedure, Each child was rated once by his 
kindergarten teacher on each of the three instru- 
ments. To achieve greater stability of the mea- 
sures, the corresponding scores from the three 
instruments were converted to standard scores and 
pooled. The resulting pooled Factor I and II scores 
were given the Kohn factor designations for pur- 
poses of convenience and consistency. Schaefer 
Factor III was retained without pooling since 
there is no Kohn factor equivalent. This dimension 
was substantially correlated with Pooled Factor 
II (r = 68) in line with previous findings (Kohn & 
Rosman, 1972a°). Interrater reliabilities of the 
pooled dimensions had been adequate (Ts ranging 
from .76 to .90) in an earlier study (Kohn & Ros- 
man, 1972a). On each of the three instruments 
used, a high score indicated disturbance. 


mt S. Schaefer, L. F. Droppleman, and A. F. 
Kalverboer, Development of a classroom behavior 
checklist and factor analyses of children’s school 
behavior in the US. and the Netherlands. Unpub- 
lished manuscript, 1965. (Available from E. S. 
Schaefer, Department of Maternal Health and 
Child Care, Duke University, Durham, North 
Carolina 27701.) 
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Measures of Cognitive Functioning 


Instruments. Five measures made up ihe class 
of cognitive predictors. The Stanford-Binet Intelli- 
gence Scale (Form L-M) is a widely used, well- 
standardized measure of general intelligence that 
is highly predictive of later school achievement 
and learning. The four other measures were derived 
from a factor analysis of a large number of cogni- 
tive tests tapping various areas of cognitive func- 
tioning; the specific tests have been described in 
Kohn & Rosman (1978a). 

1. Factor I—Visual Cognition. This factor sub- 
sumes a group of tasks demanding visual attention, 
analysis of visual material, and processing of visual 
cues. Nonverbal in presentation and performance, 
the tasks are composed of simple and concrete 
visual stimuli that minimize previous learning but 
require active mental processes such as classifying 
and reasoning. This factor was expected to show 
low correlation with school achievement, particu- 
larly Word Knowledge and Reading. 

2. Factor II—Verbal Expressivity reflects the 
child's readiness to speak, his verbal productivity 
and the coherence and clarity of his verbal com- 
munications; it was expected to account for only 
a trivial amount of variance in the achievement 
measures. 

3. Factor III—Motor Control indieates the ex- 
tent to which the child is able to follow directions 
in carrying out a task and may also reflect his 
ability to channel his energies into task-appropriate 
activities. Children who scored low on this factor 
were expected to perform more poorly than those 
who scored high. 

4. Factor IV—Verbal Cognition subsumes à 
group of tasks commonly assumed to measure 
"verbal ability." Major components of the factor 
include passive and active vocabulary, verbal 
concepts, auditory attention, discrimination, and 
memory. This factor was expected to be most rele- 
vant to school achievement, particularly Reading 
and Word Knowledge. 

Because factor analysis of preschool intelligence 
tests have revealed factorial dimensions similar to 
the four cognitive factors of the present study 
(Stott & Ball, 1965), it was expected that these 
four factors would, jointly, predict as well as the 
Stanford-Binet. The Stanford-Binet itself was ex- 
pected to be a better predictor than any one of the 
four factors. 3 

Procedure. The children were tested individually 
in five half-hour sessions on the series of cognitive 
tasks by four testers trained at the master’s level 
in school psychology. The testers were female, 
white, and in their twenties; no evidence was found 
of any systematic bias or other differences attrib- 
utable to the different testers. 


Background-Demographic Measures 


Home interviews with mothers yielded data on 
five background variables previously hypothesized 
to be relevant to school achievement (Kohn & 


Rosman, 19732). 
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1. Social class as measured by the Hollingshead 
index; occupation and education of head of house- 
hold were scaled, differentially weighted, and com- 
bined; the resulting scores formed a continuum 
ranging from 11 (low) to 77 (high). 

2. Race (0 — black, 1 — white). 

3. Welfare status (0 — family receives welfare 
assistance, 1 — no assistance received). 

4. Family size (number of children in nuclear 
family). 

5. Family intactness (2 — child lives with both 
natural parents, 1 — other). 


Achievement Measures 


Three parts of the Metropolitan Achievement 
Test (Form B) were administered to assess sec- 
ond-grade achievement: Word Knowledge, Read- 
ing, and Arithmetie Concepts and Skills. All of 
the subjects were tested, individually or in pairs, 
on the Primary I battery; children receiving a 
perfect score on one or more parts of the test were 
further tested on Primary II. Raw scores on the 
three subtests were converted to grade equivalents 
on the basis of the Metropolitan Achievement 
Test norms, Testing was carried out by three 
college student research assistants trained to ad- 
minister the instruments. These students were in 
their twenties, female, and white. Seven children 
who lived outside of the metropolitan area were 
tested by their own second-grade teachers. 


Data Analysis 


The data were analyzed by means of correla- 
tional and hierarchical multiple regression tech- 
niques. Cohen (1968) had demonstrated the equi- 
valence of multiple and partial regression on the 
one hand and analysis of variance and analysis of 
covariance on the other hand. In the hierarchical 
multiple regression technique, the independent 
variables are arrayed in sets, and the multiple cor- 
relation coefficients are determined as successive 
sets are added to the analysis. The increment in 
multiple correlation squared obtained by adding 
a new set to the analysis may be interpreted as 
the proportion of variance accounted for by the 
new set after variance associated with the pre- 
ceding sets has been partialed out. An F test of 
the significance of the increment is easily carried 
out. 

Sets that are significant are further examined to 
determine which of the variables within the set 
contribute to the significance of the set. Partialed 
out from a specific variable in a given set are the 
variables in the preceding sets (if any) as well as 
the remaining variables within the same set. The 
cumulative contribution of all the sets gives the 
final multiple correlation squared with the de- 
pendent variables. 

Two series of analyses were carried out. The first 
examined the amount of variance accounted for 
by each of the three classes of independent varia- 
bles and tested the hypotheses about the individual 
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variable in each class. The second se: 
the effects obtained by combining c 
dependent variables. The following 
mined: first, the amount of variance 
by social-emotional variables after the 
variables had been partialed out and 
amount of variance accounted for by thi 
variables after the social-emotional va 
been partialed out. These analyses also 
formation about the total amount of 
counted for by the psychological varial 
the amount of variance accounted for by 
ground-demographic variables after the t 
of psychological variables had been pai 
was determined. This final step also yi 
formation about the proportion of vai 
counted for uniquely by the background y 


RESULTS 


By Class of Variables 


The extent to which the three cl 
variables predicted the achievement: 
is shown in Table 1. The percenta; 
iance in the dependent measures 
for by each class of variables is lis 
the heading R°% (i.e., R? x 100); 
order correlations between the i 
variables and the criteria appear in. 
umn headed r,; the partial co! 
(when other variables within the 
are partialed out) are given in the 
headed ry.” 

Social-emotional variables. The 
cial-emotional variables together i 
for 16%-22% of the variance 
achievement measures. Individuall 
shows significant zero-order correl 
the criteria. When the other two. 
emotional variables are partialed 0 
each other so that the unique coni 
of each can be assessed, Factors I 


achievement is predicted uniquely b 


* Age at time of preschool testing was 
out prior to the cognitive variables as 
control; this linearly age adjusted the 
measures. Age at time of second-grade 
partialed out prior to the backgroun: " 
this variable was irrelevant for the kin: 
measures but the statistical control was 
s remove spurious background age of 
ects. 
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of these two dimensions of preschool social- 
emotional disturbance. 

Cognitive variables. 'The cognitive varia- 
bles together account for about 4196 of the 
variance in each of the dependent variables. 
'The Stanford-Binet, contrary to expecta- 
tions, contributes a small but significant in- 
crement over and above the effects of the 
cognitive factors. 

In the zero-order correlations, the Stan- 
ford-Binet is the best single predictor, but 
Factors I and IV, surprisingly, predict al- 
most as well. It is particularly surprising 
that Cognitive Factor I, composed of non- 
verbal tasks seemingly having little in com- 
mon with achievement tests, predicts almost 
as well as Cognitive Factor IV and the 
Stanford-Binet. 

When the cognitive factors are partialed 
from each other, only Visual Cognition (I) 
and Verbal Cognition (IV) remain signifi- 
cant; for remedial and predictive purposes, 
then, these are the important dimensions. 
The findings support the hypothesis that 
low school achievement is predicted by 
preschool cognitive level. 

Background-demographic variables. Tak- 
en together, the background variables ac- 
count for 1975-2296 of the variance in the 
criterion measures. All variables show sig- 
nificant zero-order correlations with each of 
the achievement measures except for one 
correlation, namely, that of Family Intact- 
ness to Word Knowledge. 

After the variables are partialed out from 
each other, higher achievement is, as hy- 
pothesized, associated with higher social 
class. At the zero-order level, white children 
performed better than black children. This 
difference was expected to diminish sharply 
after partialing; comparison of the r,s and 
the r,s shows little difference. 

Overall, low status on the social advan- 
tage-disadvantage continuum is related to 
low school achievement. 

Summary. The social-emotional variables 
account for 16%-22% of the variance of the 
criterion measures, the cognitive variables 
for about 41%, and background-demo- 
graphic variables for 19%-22% of the vari- 
ance. 

While it is not surprising that the cogni- 
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tive variables predict twice as much vari- 
ance as the background-demographie and 
the social-emotional variables, it is interest. 
ing that the social-emotional variables pre- 
dict equally as well as the background-de- 
mographic variables. ] 


Joint Effect of Psychological Variables 


The extent to which the social-emotional 
and cognitive variables, taken jointly, pre- 
dict the achievement criteria may be seen 
in Table 2. In Sequence 1, the social-emo- 
tional variables contribute very little over 
and above the variance accounted for by 
cognitive variables. Preschool social-emo- 
tional functioning and background-demo- 
graphic variables have been shown to ac- 
count for major proportions of preschool 
cognitive functioning (Kohn & Rosman, 
1973a). It seems reasonable therefore to as- 
sume that the effect of social-emotional 
functioning on cognitive functioning has al- 
ready been taken into account when pre- 
school cognitive functioning is related to ele- 
mentary school achievement. 

Sequence 2, in which the social-emotional 
variables are entered first, yields the maxi- 
mum amount of differentiated information, 
The four cognitive factors contribute a sig- 
nificant increment ranging from 17% to 
23% over and above the social-emotional 
variables; Cognitive Factors I and IV at- 
count for unique variance over and above 
the other cognitive variables as well as the 
preceding set of social-emotional variables. 
The Stanford-Binet adds an additional 3%- 
6% to the predictions of second-grade 
achievement. 

Why stress the social-emotional variables 
as predictors of school achievement when 
these factors are deeply embedded in the 
preschool cognitive variables, themselves 
the strongest predictors of academic sue 
cess? The data in Table 2 have important 
implications for remediation; intervention 
must differentiate between the child whose 
poor achievement stems from cognitive def- 
ieits and the child whose learning problems 
are largely accounted for by social-emos* 
tional disturbance. m 

The final lines of Table 2 show the KR’ 
and the multiple correlations, as well as the” 
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TABLE 2 


Joint 


EFFECTS OF THE PSYCHOLOGICAL VARIABLES (COGNITIVE AND SocrAL-EMOTIONAL FUNCTIONING) 


Achievement criteria 


Independent variables df Word Knowledge Reading Arithmetic 
R% Tp R% Tp R% fp 
Sequence 1: Social-emotional variables over and above cognitive variables 
Cognitive variables* 5/202 | 41.4*** 41.7*** 40.7*** 
Social-emotional variables 3/205 .6 93:91*. 9 
I Apathy-Interest —.02 04 s 01 
II Anger-Cooperation .07 .09 .06 
III Task Orientation —.10 = 206 an 
Sequence 2: Cognitive variables over and above social-emotional variables 
Social-emotional variables | 3/205 | 15.5*** 2. fob 16.19** 
Cognitive variables* 5/199 | 26.5*** 23.3*** 25.5*** 
Cognitive factors 4/200 |(23.0***) (16.8***) (20.4***) 
I Visual Cognition Iber .30*** byte 
II Expressivity .03 —.01 .12 
III Motor Control —.06 —.04 .00 
IV Verbal Cognition 108045. 12388 124€ 
Stanford-Binet 1/199 | (3.5***) (6.5***) (5.1***) 
Joint effects of social-emotional and cognitive variables 
R? 42,0*** 45.0*** 41.6*** 
Multiple regression .65 .68 .68 
Corrected R? 40.2 43.6 39.7 
Corrected multiple regression .63 .66 68 


N ole. Abbreviation: rp = partial correlation with all 
ceding sets partialed out. Multiple correlations squared given in 
and the Stanford-Binet are subtotals of multiple correlations squared for t! 


variables. 


other variables within the set and in the pre- 
parentheses for cognitive factors 
he whole class of cognitive 


a Variance due to preschool test age partialed out; variance not included (df = 1). 


< .05. 
01. 
.001. 


*p 


3 
INIA 


R?% and the multiple correlation values 
corrected for shrinkage (MeNemar, 1962), 
based on all predictors. The social-emo- 
tional variables and the cognitive variables 
together account for 40% to 44% (cor- 
rected) of the total variance, and the corre- 
sponding multiple correlation values are be- 
tween .63 and .66. 


Joint Effect of Psychological and 
Background Variables 

Table 3 shows that the class of back- 
ground-demographie variables adds a sig- 
nificant but relatively small proportion of 
variance, over and above the psychological 


to the predictions of Reading and 
Arithmetic but not to the prediction of 
Word Knowledge. Since background-demo- 
graphic variables account for a major pro- 
portion of preschool cognitive functioning 
(Kohn & Rosman, 1973a), it seems reasona- 
ble to assume that their effect has already 
been taken into account when preschool 
cognitive functioning is related to elemen- 
tary school achievement. 

The three significant partial correlations 
between the achievement measures and so- 
cial class and welfare status suggest either 
(a) a cumulative deficit for children from 
deprived backgrounds, as suggested by 


variables, 


274 


TABLE 3 
Jornt EFFECTS oF PSYCHOLOGICAL AND BaACKGROUND-DEMOGRAPHIC VARIABLES 
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Achievement criteria 


Independent variables df Word Knowledge Reading Arithmetic 
——————À 
R% Tp R% Tp R% Tp 
Preschool psychological variables*| 8/199 | 42.0*** 45.0*** 41.6*** 
Background-demographic varia- 
bles^ 5/193 2.5 4.9** 5.2** 
Social class .18** .23** ll 
Welfare status .02 .03 .20** 
Family size — .06 om, 1): .08 
Family intactness —.01 .02 .02 
Race .02 .06 12 
Total R? 44.,5*** 49.9*** 46 .8*** 
Multiple regression 67 71 .68 
Corrected R? 40.6 46.6 43.1 
Corrected multiple regression .64 .68 .66 


Note. Abbreviation: ry = partial correlation with 
ceding sets partialed out. 


* Social-emotional plus cognitive functioning. Variance due to preschool test age partialed out; 


variance not included (df — 1). 
b Variance due to second-grade test age partialed 
*p < 05. 
tp EST © 

*** 5 < 001. 


Whiteman and Deutsch (1968), (b) selec- 
tive school placement with children from 
poorer backgrounds going to "worse" 
schools and therefore learning less, or (c) 
unique achievement variance associated 
with social class and welfare status. 

The final lines of Table 3 show the total 
R?% and the multiple correlations as well 
as the R?% and multiple correlation values 
corrected for shrinkage (MeNemar, 1962), 
based on all predictors. The shrinkage 
is not great due to the relatively small 
number of predictors in relation to sample 
size. The three criteria are about equally 
well predicted. Proportions (corrected) of 
the total variance accounted for range from 
41% to 43%. 


Discussion 


The finding that performance on cogni- 
tive tasks contains two sources of variabil- 
ity, namely social-emotional and cognitive 
functioning, is in line with the position 
taken by Ziegler (1966) that performance 
on intelligence tests may be conceptualized 
as reflecting three distinct factors: (a) for- 


all other variables within the set and in the pre” 


out; variance not included (df = 1). 


mal cognitive process, (b) informational 
achievement (content rather than the for 
mal properties of cognition), and (c) pere 
sonality variables. 
A number of recent studies (Kohlberg 
1968; Levenstein, 1969; Massimo & Shore, 
1963; Ziegler & Butterfield, 1968) hav : 
yielded ‘evidence that gains in cognitive 
performance can be primarily due to the 
social-emotional components. What 18 
needed is a clear delineation of those opera- 
tions leading to changes in the cognitive 
components in contrast to those operations 
leading to changes in the noncognitive com- 
ponents of achievement measures. | 
The group of studies leading up to am 
including the present work suggest ways 
which social-emotional functioning may af 
fect cognition. It has been theorized (Ko! 
& Rosman, 1973a) that Factor I (Interest: 
Participation versus Apathy-Withdrawal) 
acts on cognitive functioning in the follow 
ing ways: 
1. The child who scores high on Interest- 
Participation learns more from his environs 
ment because of his curiosity, assertivenes 
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and high rate of social interaction; the 
child high on Apathy-Withdrawal learns 
less from his environment because of his 
diminished contact with environmental 
stimuli and low rate of social interaction. 

2. Children high on Interest-Participa- 
tion are mentally more alert and more 
likely to engage in active thought processes 
such as hypothesis formation and testing, 
attention, discrimination, ete. Apathetic- 
Withdrawn children are mentally more inert 
and less inclined to make sense about what 
goes on around them; in fact, they may 
avoid thinking. 

As for the processes intervening between 
Factor III (Task Orientation) and cogni- 
tive funetioning, hypotheses analogous to 
those formulated for Factor I seem plausi- 
ble as follows: 

1. The child who scores high on Task 
Orientation learns more from his environ- 
ment because he is organized and system- 
atic in his contacts; the child scoring low on 
Task Orientation, because of his hyperac- 
tivity and restlessness, does not have suffi- 
ciently prolonged contact with specific ob- 
jects and events in his environment to learn 
anything about their properties. 

2. The child high on Task Orientation is 
more organized not only in the way he con- 
tacts his environment but also in the way 
he thinks; the child whose behavior is dis- 
organized, hyperactive, and restless has 
thought processes that share these charac- 
teristics. 

_ In summary, the child most likely to uti- 

lize his cognitive processes in a productive 
way and to make gains in cognitive 
achievement is one who is active, assertive, 
curious, task involved, and well organized. 
Further research is needed to show that this 
kind of child does achieve in school and 
that, in fact, children scoring high on Inter- 
est-Participation and Task Orientation do 
exhibit these behaviors in the classroom and 
do engage in these kinds of thought proc- 
esses. 


Components of Social-Emotional 
Functioning 


] Social-emotional functioning relevant to 
intellectual functioning is hypothesized to 
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consist of two components: (a) a personal- 
ity predisposition, that is, a person-specific 
component and (b) a situation-specific 
component. 

In previous research, the factor dimen- 
sions showed moderate stability longitudi- 
nally (Kohn & Rosman, 1972a) and across 
settings (Kohn & Rosman, 1978b). The pres- 
ent study demonstrated significant correla- 
tions between preschool personality func- 
tioning and second-grade achievement— 
further evidence of the presence of a per- 
son-specific component. This component 
suggests that over a relatively wide range 
of learning conditions, children who are 
well adjusted on social-emotional Factors I 
or III or both learn more, and children who 
are disturbed on these dimensions learn less. 

A previous study (Kohn, 1968) provided 
indirect evidence for a situation-specifie 
component. The study showed large center- 
to-center differences in the average amount 
of Apathy-Withdrawal and Anger-Defi- 
ance, differences that could not be ac- 
counted for by differences in the back- 
ground of the children in these centers. 

Prescott and Jones' (1967) analysis of 
day care environments suggests some rea- 
sons for large center-to-center differences. 
Their data indicate that children's interest 
was a function of the encouragement, ap- 
proval and nurturance provided by the 
teachers as well as of the extent to which 
the lessons were designed to encourage crea- 
tivity, experimentation, “pleasure, awe, and 
wonder.” These findings imply that class- 
room and learning environments can be cre- 
ated to arouse optimal degrees of Interest- 
Participation. 

The situational and person-specific com- 
ponents of social-emotional functioning 
suggest two types of strategies for interven- 
tion to improve cognitive performance, one 
directed at creating optimal learning envi- 
ronments and the second designed to im- 
prove the person-specific component of the 
child’s social-emotional functioning. 

Intervention aimed at the person-specific 
component would involve a therapeutic ap- 
proach for the child who is disturbed in 
learning-relevant ways, that is, high on Ap- 
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athy-Withdrawal low in Task 
Orientation. 

A pilot study (Kohn & Rosman, 1971) has 
suggested that the children high in Apathy- 
Withdrawal have experienced a great deal 
of parental control and are made fearful in 
contacting the environment; children low in 
Task Orientation have experienced rejec- 
tion, neglect, and a disorganized family life. 
This pilot study suggests that long sus- 
tained corrective emotional experiences 
would seem to be in order for both kinds of 
children in order to remediate both their 
personality disturbances and their cognitive 
functioning. 


and/or 
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This study applied a method of generating task hierarchies (ordering 
theory) to seven Piagetian tasks. Thirty subjects were individually 
administered three concrete operational and four formal operational 
tasks. Analysis of the response patterns on the dichotomously scored 
tasks revealed the following findings. First, the Piagetian theory that 
success on concrete operational tasks is a necessary prerequisite to 
success on formal operational tasks was confirmed. Second, the seven 
tasks were closely interrelated with an array of prerequisite relations 
more complex than a simple linear hierarchy. The methodology uti- 
lized appears to have value for defining nonlinear lines of implication 
among behavioral science phenomena. 


The purpose of this study was to expli- 
cate ordering theory, a measurement model 
which identifies necessary and sufficient 
performance conditions between test items 
or tasks, by analyzing seven Piagetian 
tasks. Ordering theory is an extension of 
Guttman (1944, 1950) scaling procedures in 
that it identifies nonlinear as well as linear 
hierarchies among items or tasks. Scalo- 
gram analysis is used to order a group of 
items or tasks into a linear hierarchy and to 
evaluate whether or not the hierarchy is 
unidimensional and cumulatively hierarchi- 
cal. The degree to which a group of items or 
tasks is judged to be unidimensional and 
cumulative is determined by the extent to 
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which “passes” (scores of 1) on any item or 
task co-oecur with passes on all items or 
tasks ranked as less difficult. The inverse is 
also true; that is, a hierarchy is unidimen- 
sional and cumulative insofar as "failures" 
(scores of 0) on an item or task co-occur 
with failures on items ranked as more diffi- 
cult. Scalogram analysis, however, is con- 
strained to the identification of linear hier- 
archies. A linear hierarchy is one in which a 
series of test, items or tasks are arranged in 
a hierarchy such that one and only one item 
or task appears at any given level of the 
hierarchy and one and only one task is im- 
mediately prerequisite to or a sufficient con- 
dition for any other item or task. 

Ordering theory is a measurement model 
which is based upon scalogram analysis but 
which extends scalogram techniques to non- 
linear item or task hierarchies. Ordering 
theory is an approach to fundamental 
measurement which has as its primary in- 
tent either the determination of a hierarchy 
for a set of test items or tasks or the testing 
of a hypothesized hierarchy among a set of 
items or tasks (Airasian & Bart, 1973). At 
a foundational level of explanation, order- 
ing theory is a measurement model with a 
Boolean algebraic framework in which item 
or task response patterns are viewed as at- 
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oms in a Boolean algebra with as many 
generators as there are items or tasks being 
considered (Goodstein, 1963). Though there 
are a variety of ways of articulating order- 
ing theory, it is probably most meaning- 
fully classified as a deterministic measure- 
ment model which is useful for identifying 
both linear and nonlinear, qualitative pre- 
requisite relations among test items or 
tasks. In this regard, it is a more general 
case of scalogram analysis. 

Ordering-theoretie procedures were used 
in this study to determine prerequisite rela- 
tions between pairs of Piagetian tasks. The 
prerequisite relation is considered here, 
since that type of relation is a primary in- 

terest to behavioral scientists, especially in 
their quest to identify causal relationships 
among phenomena. 

An item i is a prerequisite to an item j to 
the extent that the (0,1) response pattern, 
where 0 represents the score on item i and 1 
represents the score on item j, occurs infre- 
quently, The (0,1) response pattern is 
viewed as a disconfirmation that a correct 
response to item i is a prerequisite to a 
correct response to item j. In defining pre- 
requisite relations between items, ordering 
theory shares one limitation with scalogram 
analysis, Both measurement models are de- 
terministie rather than probabilistic. Thus, 
neither model incorporates a method of 
dealing with the probability of encounter- 
ing random error in item response patterns. 
As a consequence of this limitation, order- 
ing-theoretic analyses rely upon the use of 
a preset tolerance level for error. The toler- 
ance level sets the number of disconfirma- 
tory response patterns which will be ac- 
cepted in defining a prerequisite relation 
between two items. Thus, for a 5% toler- 
ance level and n subjects, one would toler- 
ate at most .05 n disconfirmatory response 
patterns between items in an item pair be- 
fore accepting a prerequisite relation. 

The logical relation between any two 
items such as the prerequisite relation can 
thus be identified from an examination of 
the magnitude of the cell frequencies in the 
2 X 2 matrix relating the item scores. Thus, 
for a 5% tolerance level and n subjects re- 
acting to items i and j, an absence of sub- 
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jects (ie. <.05n) in only the i j 01 cell 
indicates that i is a logical precondition of 
j, an absence of subjects (i.e., <.05n) in the 
i j 01 and 10 cells indicates that items i and. 
j are logically equivalent, and the presence: 
of subjects (ie., >.05n) in all cells indi- 
cates that items i and j are logically inde- 
pendent. The logical relation of prereq-- 
uisite and its related logical relations of 
equivalence and independence are the cru- 
cial relations used in ordering theory. 

There are two general strategies for the 
implementation of tolerance levels in exam- 
ining the hierarchical structure or ordering 
among a set of items. Within one strategy, 
a hierarchy with its array of prerequisite 
relations is hypothesized for a set of items 
or tasks. The response patterns that would 
be disconfirmatory for the entire hierarchy 
are identified and a tolerance level is estab. 
lished (Airasian, 1971a, 1971b). The hy- 
pothesized hierarchy is then accepted if the 
frequency of obtained disconfirmatory re 
sponse patterns is less than or equal to the 
prescribed tolerance level. 

An alternative strategy, used when no & 
priori hierarehy among items is hypothe- 
sized, identifies prerequisite relationships bez 
tween item pairs. In this strategy, all possit 
ble item pairs are investigated to identify 
prerequisite relations. The prerequisite rela- 
tion for a particular pair of items is ati 
cepted if the frequency of obtained discon-- 
firmatory response patterns for the item: 
pair is less than or equal to the frequency” 
of such response patterns established by the 
tolerance level. This procedure is followed 
to test each of the possible hypothesized 
prerequisite relations with the same toler 
ance level being used for each testing 
Within this strategy, item response patter i 
that disconfirm one prerequisite relation 
may be different from the item response 
patterns that disconfirm another prerequi- 
site relation. i 

Several discussions of ordering theory” 
have been provided. Airasian and Bari 
(1973) articulated the general nature of or 
dering theory. Bart and Krus (1973) E 
scribed an ordering-theoretie technique P% 
which item or task hierarchies could be de- 
termined. Airasian, Bart and Greaney 
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(1973) investigated a propositional logie 
game and compared the ordering-theoretie 
results to the results of sealogram analyses. 
Two defining properties or ordering theory 
cited in these discussions are that all test 
items or tasks to be examined must be di- 
chotomously scored and that all subjects in 
a sample must respond to all of the items or 
tasks. 

A computer program which searches out 
all of the prerequisite relations between all 
item pairs has been developed (Lele & 
Bart, 1971). The program accepts a prereq- 
uisite relation between an item pair if the 
frequency of obtained disconfirmatory re- 
sponse patterns is less than or equal to the 
frequency of such response patterns estab- 
lished by the tolerance. This procedure is 
followed to test for a prerequisite relation 
between all possible item pairs, with the 
same tolerance level used for each testing. 
In this study, the computer program was 
used to examine the hierarchy among seven 
Piagetian tasks that tested for various lev- 
els of operational reasoning. 


METHOD 


Subjects 


The subjects were 30 high school freshmen. 16 
males and 14 females, whose mean age was 14.1 
years, with a range from 130 years to 15.1 years. 
The subjects were enrolled in a high school in the 
Chicago area. 


Procedure 


Seven tasks that tested for operational schemes 
which are general internal forms of knowing 
(Furth, 1967) were administered to each of the 30 
subjects in the following manner: each subject 
was tested individually with the seven tasks being 
given one at a time in a prescribed order over à 
75-minute period. The order of task presentation 
was as follows: 


. matrix task, 

tactile seriation task, 

. animal classification task, 

. equilibrium in the balance task, 

. projection of shadows task, 

. oscillation of the pendulum task, and 

. conservation of motion in a horizontal 
plane task. 


NO oP wre 


The first three tasks tested concrete reasoning and 
were derived from The Early Growth of Logic in 
the Child (Inhelder & Piaget, 1969). The other 


Ficure 1. Eight cards used in the matrix task. 


four tasks tested formal reasoning and were derived 
from The Growth of Logical Thinking from Child- 
hood to Adolescence (Inhelder & Piaget, 1958). 
For each task, the responses of each subject were 
independently judged and scored by two raters, 

1. Matriz task. The matrix task involved the 
use of eight cards. On one card, there were five 
figures and on each of the other cards there was 
only one figure. The eight. cards with their figures 
are shown in Figure 1. This display is identical to 
the way in which the cards were shown to each 
subject. 

In this task, the subject was shown the eight 
cards with their circle, triangle, and square figures 
and then was asked 


Which of the seven figures here [the experi- 
menter points to the figures on the seven small 
cards] best fits in this blank space [the experi- 
menter points to the blank space on the large 
card] so that it fits this way (horizontally) and 
that way (vertically)? 


The correct answer is the figure on the small card 
in the second row and the first column of small 
card array. The subject was then asked “Why 
did you choose that one?” The correct answer for 
this question would indicate that the correct figure 
is a triangle because the row that it is in has a 
triangle, that the figure is uncolored because the 
column that it is in has only uncolored figures, 
and that the rows are determined by shape and the 
columns are determined by color or degree of 
shading. After the subject’s response to that ques- 
tion was assessed, the subject was then asked “Tell 
me whether any of the other figures might fit 
in as well or better?” This question tested the 
stability of the choice of figure indicated in the 
response to the first question. The correct answer 
to this question was basically that no other figure 
fitted in the space well. 

The two defined scores for this task were 1 and 
0, with 1 being given if the sum of the questions 
answered correctly according to the two raters was 
either 5 or 6 and 0 being given if that sum was 
Jess than 5. A score of 1 for this task was an 1n- 
dication that a subject had the scheme of multi- 
plicative classification which is proper to the 
period of concrete operations. For this task, the 
raters had point value agreement of 86 of the 90 
question responses. The percentage of interrater 
agreement for the question responses for this task 
was used as an index of interrater reliability and 


had the value of .96. , : 
2. Tactile seriation task. In this task, the subject 
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was encouraged to examine by touch five pencils 
which were hidden from sight by a sereen. The sub- 
ject was then requested to order the pencils from 
longest to shortest. The two raters assessed the 
drawing of the subject and the correct reply for 
that request would be a drawing of the pencils 
such that the bases of the pencil figures would be 
on the same horizontal line and the pencils would 
be in order from longest to shortest with 5 pencil 
figures being indicated. 

The defined scores for this task were 1 and 0, 
with 1 being given if the subject correetly replied 
to both task assignments according to both raters 
and 0 being given otherwise. A score of 1 for this 
task was an indication that a subject had the 
scheme of simple seriation which is proper to the 
period of concrete operations, For this task, the 
interrater agreement was 1.00. 

3. Animal classification task. In this task, the 
subject was shown 12 cards with a different animal 
picture on each card. Four ducks, three other 
birds (chicken, sparrow, and parrot), and five 
animals that are not birds (mouse, fish, horse, 
poodle, and snake) were depicted on the cards. The 
subject was then asked to make separate piles, 
placing the animals which were like each other in a 
pile. After that activity, the subject was asked the 
following questions in this order: 


Are there more ducks or more birds? 

Are there more birds or more animals? 

If a fox ate all the birds, would there be any 
ducks left? 

What if one killed all the animals, would there 
be any birds left? 


Two raters independently assessed the responses 
to these questions as to their correctness; the 
correct answers were “more birds" *More ani- 
mals,” “no,” and “no” to the four questions, respec- 
tively. The two defined scores for this task were 1 
and 0, with 1 being given if the sum of questions 
answered correctly according to the two raters was 
either 7 or 8 and 0 being given if that sum was less 
than 7. A score of 1 on this task indicated that a 
subject had the scheme of additive classification 
which is proper to the period of concrete opera- 
tions. For this task, perfect interrater agreement. 
was reached, 
4. Equilibrium in the balance task. In this task, 
a conventional balance with an array of'six weights 
was used. Inhelder and Piaget's (1958) procedures 
for questioning the subject and for assessing the 
responses for this task were used in this study. 
Thus, subjects were assessed to be at various sub- 
stages: 1-A, 1-B, 2-A, 2-B, 3-A, and 3-B, with 
Substage 3-B being the highest substage as a 
consolidated level in the period of formal opera- 
tions. The raters independently awarded points to 
subject responses according to the following con- 
vention: 1 point for substage 1-A and 1 additional 
point for each successive substage with 6 points for 
substage 3-B. The two defined scores for this task 
were 1 and 0, with 1 being given if the subject 
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received a combined rater point total of less than 
10. Since the array of substages used in the other 
three formal reasoning tasks were very similar in 
form to the substages for this task, the same 
quantification procedure for the attribution of 
task scores was used for the other three tasks, À 
score of 1 on this task indicated that a subject had 
the scheme of proportionality which is proper to. 
the period of formal operations. For this task, the 
rater agreement was .87. 

5. The projections of shadows task. "This task 
involved a screen, an electric light, a. baseboard 
with a sequence of holes set. at one-inch distances, 
and four rings with diameters of one, two, three, 
and four inches, respectively. Basically, the task 
of the subject was to indicate what distance from 
the light a given ring should be placed so that its 
shadow on the sereen conformed to the shadow 
produced by another ring placed at a given 
distance. Inhelder and Piaget's (1958) procedures 
for interviewing the subject and for assessing the 
responses for this task were used in this study, 
The score of 1 for this task indicated that a sub- 
ject had the scheme of proportionality which is 
proper to the period of formal operations. For this 
task, interrater agreement was .90. 

6. The oscillations of the pendulum task, This 
task consisted of a string which could be 
lengthened or shortened, 2 wooden apparatus to 
hang the string, and a set of weights. Basically, the 
task of the subject was to identify the factor that 
regulated the rate of oscillation of the pendulum. 
Inhelder and Piaget’s (1958) procedures for ques- 
tioning the subject and for assessing the responses 
for this task were used. A score of 1 for this task 
indicated that a subject had the scheme of com- 
binatorial reasoning which is proper to the period 
of formal reasoning. The interrater agreement for 
this task was .87. 

7. Conservation of motion in a horizontal plane 
task. This task involved a long board with a central 
groove, a thick plastic pendulum hammer fixed at 
one end of the board, and six balls, each of which 
would be projected down the groove by the action 
of the hammer head. Basically, the task of the 
subject was to identify the factors that regulated 
the distances balls traveled down the groove when 
struck with a constant force by the hammer. In- 
helder and Piaget's (1958) procedures for question- 
ing the subject and for assessing the responses a 
this task were used in this study, A score 0 
1 for this task indicated that a subject had the 
scheme of combinatorial reasoning which n 
to the period of formal operations. For this task, 
interrater agreement was 83. Though this s 
was the lowest measure of interrater reliability, 1 
still was quite high and thus substantial interrater 
reliability was established for this task. 

For each of the seven tasks used, procedures 
for the dichotomous scoring of the task responses 
were thus established. From the employment s 
these tasks and the highly reliable respon 
assessment procedure, a 30 X 7 task score mu zi 
with Is and 0s was generated with the 10 
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TABLE 1 


Jrem RESPONSE PATTERNS FOR SEVEN PIAGETIAN 


Tasks ron A SAMPLE or 30 SUBJECTS 


n. Task 

response Frequency 
patterns |||2|3|4|s|6]|7 

1 0/0 0|(0/|0/|0/|0 1 

2 0/0 1 0|0/|0/|0 1 

3 1 1 1[|[0/[0/|0/],0 3 

4 1 1 1 1 0|0/0 7 

5 1 1 1/|/0/|0/|1/]|0 7 

6 1 1 1 1 0 1 0 6 

v 1 1 1 1 1 1 1 5 


referring to the 30 subjects and the columns re- 
ferring to the seven tasks. 


RESULTS 


The task score matrix for the 30 subjects 

was generated and was used in an array of 
analyses that provided some comparisons of 
ordering-theoretie techniques with other 
more conventional data analytic tech- 
niques. One basic use of the task score ma- 
trix was to identify the basic item response 
patterns that occurred. 
_ As can be seen in Table 1, seven distinct 
item response patterns were generated by 
the 30 subjects. In addition, the subjects 
tended to perform fairly well on the tasks 
since most of the total task scores for the 
subjects were more than 3. 

A scalogram analysis was performed on 
the item response patterns shown in Table 
1. For the data in Table 1, the coefficient of 


TABLE 2 


NuMBER or SUBJECTS IN THE 10 AND 01 CELLS BETWEEN Bach 
One Task More DIFFICULT THAN THE OTHER Task) 


Numper or Sussects WHo Founp 
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reproducibility was .93, indicating the tasks 
constituted, with some error, a linear hier- 
archy. Given that a linear hierarchy model 
did not accurately fit the task data in Ta- 
ble 1 and given that Guttman’s (1950) scal- 
'ogram analysis is primarily limited to the 
fitting of item data to a linear hierarchical 
model, ordering-theoretic procedures were 
used to determine the ordering among the 
Piagetian tasks. 

From Table 1, the frequencies in the 01 
and 10 cells for each pair of tasks were 
determined and reported in Table 2. As has 
been noted, those frequencies are crucial in 
determining the logical relations among the 
tasks, Those logical relations between tasks 
are also indicated in Table 2. Thus, for 
example, one can infer from Table 2 that 
Tasks 4 and 6 are logically equivalent. 

If tolerance levels are established, then 
the tolerance levels will determine the max- 
imal frequencies either in the 01 cell or in 
the 10 cell that may occur in order for the 
prerequisite relation to be accepted for any 
given task pair. Though the ordering-theo- 
retic program used to analyze this task 
data does not actually generate the inter- 
task score contingency tables needed for the 
ordering-theoretic data analysis to occur, 
those contingency tables could be used to 
formulate the hierarchy among the tasks 
for any tolerance level in accordance with 
the rules for determining prerequisite rela- 
tions that have been discussed. 

With the use of a computer program, the 


OF THE SEVEN OPERATIVITY Tasks (THE 


Task* 
Task 2 3 4 5 6 1 
LU Aot MM ESS Lm 
10 | ot 10 | o 10 | ot 10 | o1 | 10 01 10 01 
i 0 
1. Matrix 0 0^ 0 1 10 0 23 0 10 0 23 
2. Seriation 0 1 10 0 23 0 10 0 23 0 
3. Animal ll 0 24 0 11 0 24 0 
4. Balance 13 0 7 © 13 2 
5. Shadow 0 13 P . 
6. Pendulum 


a Zero in only the 01 or only 10 cell, but not bot 


h, indicates precondition relationship between tasks. 


» Pattern indicates logical equivalence between tasks. 


* Pattern indicates logical independence of tas 


ks. 
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task response patterns were subjected to an 
ordering-theoretie analysis with a 196 toler- 
ance level being established. Hence, pairs of 
tasks, where greater than 1% disconfirma- 
tory response patterns were found, were re- 
jected as manifesting a prerequisite rela- 
tion. In the case of 30 subjects, a 1% toler- 
ance level established a frequency of .03 as 
the maximal frequeney that either the 01 
cell or the 10 cell can manifest before the 
prerequisite relation is rejected. Upon ex- 
amination of disconfirmatory frequencies 
set by various tolerance levels for 30 sub- 
jects, a 1% tolerance level and a 0% toler- 
ance level were equivalent for 30 subjects in 
that no disconfirmatory response patterns 
were allowed before corresponding prereq- 
uisite relations were rejected. As a conse- 
quence, the identified ordering manifested 
perfect reproducibility. In essence, then, the 
response patterns for each student on the 
seven Piagetian tasks were examined to de- 
termine which tasks were prerequisites for 
the satisfactory accomplishment of other 
tasks. Figure 2 depicts the ordering deter- 
mined for the seven tasks. Note that the 
ordering is similar to a hierarchy. 

Figure 2 indicates prerequisite relations 
as well as relations of logical equivalence 
and logical independence between tasks in 
various task pairs. Success on one task is a 
prerequisite to success on another task if 
the number representing the first task is 


5 29——————— 7 


i 


3 


Ficure 2. Ordering diagram for seven piagetian 
tasks. (Double arrow + indicates logical equiva- 
lence and single arrow > indicates logical precon- 
dition.) 
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connected to the number rep 
second task by a single arrow 
relation, “is a prerequisite to,” 
as “is a precondition for” and “í 
sary condition for” may be viey 
synonymous. Also, if success on Ti 
necessary condition for success 
then success on Task B is a sufficii 
tion for success on Task A. In oth 
the converse of the relation, “is a 
condition for," is the relation, “ 
cient condition for." When both of 
relations hold between two tasks, 
relation of “is a necessary and sufficit 
dition for," which is its own conve 
between the two tasks. These poi 
made merely to indicate that 
many ways of articulating the p 
relation between tasks. The beha 
searcher may use the way of articuli 
specific logical relation between tw 
that would be most meaningful to hi 
example of a prerequisite relation 
the seven tasks as indicated in F 
that success on Task 1 is a pre 
suecess on Task 4. P. 

A task is logically equivalent to ai 
task if success on one task co-occur 
with success on the other task. 
words, the task is logically equi 
another task if success on the first tas 
necessary and sufficient condition fol 
cess on the other task. Logical equi 
between two tasks indicates that 
score for one task is the same as 
score for the other task for any test 
ject. A figurative representation for 
equivalence between two tasks 
double arrow «€» connecting numb 
are referents for the two tasks. An 
of two logically equivalent tasks 2 
cated in Figure 2 are Tasks 1 and 2. | 

A task is logically independent of 
task if the score for one task is unt 
the score for the other task. Logi 
pendence between two tasks indi 
each of the four possible response p 
for the two tasks occurs at a ire 
greater than that established by & to 
level. Figuratively, one task is logi 
dependent of another task if the. 
representing the first task is not CO! 


nre! 


ORDERING PIAGETIAN TASES 


to the number representing the second task 
by a line that passes either in a general 
upward direction or in a general downward 
direction or horizontally. An example of 
two logically independent tasks as indi- 
cated in Figure 2 are Tasks 4 and 6. With 
the logical relations between tasks identi- 
fied, the item ordering can be constructed as 
indicated in the case of Figure 2. However, 
the ordering determined for a set of tasks 
will tend to vary with the tolerance level 
employed. For example, if a 5% tolerance 
level were used, the Figure 2 structure 
would be different with the graphical rela- 
tion 


1o2e3 


then occurring for Tasks 1, 2, and 3. A 1% 
tolerance level was used primarily on the 
grounds that the experimenters wanted to be 
as strict as possible as they allowed for a 
minimal amount of error. Also, in identify- 
ing prerequisite relations, the frequencies 
between Ol and 10 cells will differ and in 
most cases the difference will be significant. 
However, if one employs a low tolerance 
level such as 195, a small difference between 
the 01 and 10 cells frequencies for two tasks 
may occur with a logical precondition rela- 
tion holding for the two tasks; this situa- 
tion occurred for Tasks 1 and 3 and for 
Tasks 1 and 2. In general, with higher tol- 
erance levels, a precondition relation im- 
plies that it is more likely for one task to be 
significantly more difficult than the other 
task. With lower tolerance levels, logical 
equivalence between two tasks implies that 
it is more likely for the difficulties of the 
two tasks to be equal. 

One prominent finding from Figure 2 and 
Table 2 is that evidence is provided that 
the conerete operational schemes required 
by Tasks 1-3 are necessary but not suffi- 
cient conditions for the formal operational 
schemes required by Tasks 4-7. In other 
words, for the subjects tested, classification 
and seriation schemes were prerequisites for 
the formal operational schemes of propor- 
tionality and combinatorial reasoning. That 
finding is compatible with Piagetian theory 
insofar as Piaget (e.g., Inhelder & Piaget, 
1958) in a host of books has contended that 
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concrete reasoning is a precondition to for- 
mal reasoning. 

Another prominent finding is that evi- 
dence is provided that the seven tasks are 
closely interrelated with an array of pre- 
requisite relations indicating a hierarchical 
structure more complex than that of a sim- 
ple linear hierarchy. If the seven tasks were 
in a simple linear hierarchy, then any two 
tasks would be related by the prerequisite 
relation and there would be no two tasks 
that are logically independent. However, 
Tasks 4 and 6 were found to be logically 
independent for the sample tested. Within 
the hierarchy of the seven tasks, the two 
tasks (4 and 5) that assess the scheme of 
proportionality, the two tasks (6 and 7) 
that assess the scheme of combinatorial 
reasoning, and the two tasks (1 and 3) that 
assess classification skills each determine a 
linear scale. 


DISCUSSION 


The purpose of this research was not to 
determine the universal and eternal order- 
ing of the seven Piagetian tasks but rather 
to demonstrate how ordering-theoretie pro- 
cedures can be used to determine the pat- 
tern of logical relations among a class of 
Piagetian cognitive processes—in this case, 
the schemes required by seven Piagetian 
tasks. The ordering determined for the 
seven Piagetian tasks used held for the 30 
subjects tested and was compatible with 
Piagetian theory, but many more samples 
of subjects from around the world would 
have to be assessed with the seven tasks 
and the data analyzed with ordering-theo- 
retie procedures in order to establish the 
level of universality and population gener- 
alizability that the ordering has. 

However, it should be clear that the pos- 
sible use of ordering-theoretic methods for 
education and other social sciences are 
myriad. Whenever logical relations between 
phenomena or events are of interest to a 
researcher, ordering theory ean and should 
be used. Traditional psychometric theory 
has given us & family of techniques that 
allow us to discern patterns of association 
in terms of correlations. However, such 
techniques do not greatly help us to deter- 
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mine lines of causation. Ordering theory 
can reveal lines of implications among phe- 
nomena, which in turn can serve as a basis 
for hypothesizing lines of causation to be 
tested in experimental settings. Ordering 
theory should be a useful tool to the educa- 
tor or social scientist. 

An example of the educational utility of 
ordering theory lies in its potential relation- 
ship to the task analysis method of Gagne 
(1965). Basically, some primary task com- 
petency is designated to be an important 
educational objective and then a hierarchy 
of prerequisite tasks underlying the pri- 
mary task is generated in an a priori man- 
ner. One intent of such a task hierarchy is 
that it indicates a path by which educa- 
tional development can proceed with the 
end being the attainment of the primary 
task competency. Thus, the Gagne method 
has utility for formulating instructional 
and curriculum-sequencing decisions. The 
functions of ordering theory for the Gagne 
method of task analysis lie in the empirical 
determination of task hierarchies which can 
be guides for instructional and curricular 
decisions and in the empirical testing of a 
hypothesized task hierarchy — (Airasian, 
1971a). Thus, ordering theory would be an 
aid to the fields of curriculum and instruc- 
tion. 

As for developmental psychology and 

: Piagetian theory and research, ordering 
theory should be helpful in providing infor- 
mation on hierarchical structures among 
behaviors which, in turn, constitute a com- 
mon and useful way to conceptualize be- 
haviors from a developmental viewpoint. In 
the case of this study, the hierarchy among 
the cognitive behaviors assessed was very 
compatible with the hierarchy among the 
same cognitive behaviors which have been 
posited by Piaget. It is anticipated that 
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other cognitive behavior hierarchies 
by Piaget will be empirically verifi 
ordering-theoretic methods. 1 
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Emory University 


Five factors of socialization were origin: 
of poor blacks and whites (86 boys and 
(1) verbal facility, (2) coping with anxie 


with anxiety by aggression, (4) alienation, 
were followed up by tests of reading readiness (kindergarten 


by Factor 1 (except for end of first-grade 


for kindergarten gir! 
grade levels; and by 


tor 5 had no statistical usefulness. Fac 


up i 


In a study of economically disadvan- 
taged, black and white children in metro- 
politan Atlanta begun in early 1967, Ri- 
chards and McCandless factor analyzed the 
results from a series of reasonably reliable 
tests, most of which have satisfactory Va- 
lidity associations (See Richards & McC- 
andless, 1972, for the data and descriptions 
of the tests). Five major factors in the so- 
cialization of inner-city, black and white, 
boys and girls were revealed. They are 
briefly sketched below. In the present 
study, the authors follow up as many of 
these children as could be located and make 
an analysis of the similarity of the present 
to the original sample (no real difference). 

Richards and McCandless (1972) re- 
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vealed five major factors in the socializa- 
tion of inner-city, black and white, four- 
and five-year- ]d boys and girls. 

Factor 1 was called verbal facility. Ma- 
jor loadings are for teacher ratings (for all 
teacher ratings, see Goldstein & Chorost, 
1966) of verbal skills, quality of speech, 
and activity versus passivity of speech, 
with the addition of a loading on tested 
verbal intelligence. 

Factor 2 was called coping with anxiety 


by withdrawal. Heaviest loadings B8Te - 


teacher ratings for isolation, unhappiness, — 
“the silent child,” 


fearfulness or tearfulness, 
and “the child with separation problems.’ 
Factor 3 was called coping with anxiety 


by aggression. Highest loadings are for 


teacher ratings of aggressive reactions, pro- 
vocativeness to teachers, disruptiveness, low 
cooperation, low restraint of motor activity, 


and hyperactivity. d 3 : 
Factor 4 was called alienation. High 
the Children’s Self-Social 

est (Long & Henderson, 1967, 


1968; Long, Henderson, & Ziller, 1970). 
The four subtests that contributed to this 
agg ar 
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factor were Identification with Friends, 
Identification with Teacher, Identification 
with Father, and Identification with 
Mother. 

Factor 5 is sex. The major loadings are 
biological sex and “psychological” sex, as 
measured by the It Scale for Children 
(Brown, 1956). 

These factors were derived from data col- 
lected from children attending prekinder- 
garten educational intervention classes in 
1966, 1967, and 1968, at which times they 
were four and five years old. 

In this paper, the present authors present 
data about the subsequent school achieve- 
ment of all of these children who could be 
located in the Atlanta Public Schools. Data 
are available for the large sample (Sample 
1 of Richards and McCandless, 1972) of 
children who have now completed the sec- 
ond grade. Some data are also available for 
the smaller Sample 2 of Richards and Mc- 
Candless. School success (as judged from 
standardized school readiness or achieve- 
“ment scores) is interrelated with the five 
factors of socialization summarized above. 


Merxop 


Subjects 


Sample 1. From the Richards and McCandless 
(1972) data, complete test data were available for 
181 of the Richards and McCandless Sample 1. 
There were 86 boys and 95 girls in this group. These 
children attended prekindergarten in 1968-1969. 
The black-white ratio was approximately 30% 
white to 70% black. One hundred and forty-one of 
these children (69 boys, 72 girls, same racial ratio) 
were located at the end of their kindergarten year; 
125 (60 boys, 65 girls, black-white ratio of 3 to 1) 
were located at the end of their first-grade year; 
and 73 (39 boys and 34 girls) were located at the 
end of their second-grade year. The racial ratio 
continued at 3 blacks to 1 white. 

Sample 2. The children from the prekindergarten 
classes of 1966 and 1967 are grouped together in 
Sample 2, which is much smaller than Sample 1. 
The original number was 74, with 32 boys and 42 
girls and, again, there was a black-white racial ratio 
of 3 to 1. After the kindergarten year, only a small 
number of subjects could be located and follow-up 
multiple regression analyses were no longer justi- 
fied. 


All children in Samples 1 and 2 come from neigh- 
borhoods classified as poverty areas by Office of 
Economic Opportunity standards. ows 

Metropolitan Readiness Tests or Metropolitan 
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Achievement Tests were employed as 
of academic achievement. These tests are 
given in April of each year at these grade 
the Atlanta Publie School system. 

Factor scores from the various factor- 
solutions of Richards and McCandless’ (1 
for subjects in Samples 1 and 2 were correlati 
their Metropolitan Readiness and/or Achie 
Test scores by the use of multiple, step 
gression analyses. The analysis provided the. 
lative percentage of variance for Metropolit 
Readiness and Achievement Tests accouni 
by each factor* The purpose of these analy 
to determine the relations between the faci 
and subsequent school performance as m 
by Metropolitan Readiness and/or Achie 
Test scores. 

Since there was such a high rate of loss. 
low-up subjects from 1969 to 1972, all fol 
samples were compared to the original s 
determine if each of the follow-up samples 
representative of the original. Peabody Pictu 
cabulary Test scores of each of the follow-up 
ples were compared to the scores of the sub, 
the original sample (1968-1969), and no si 
differences were found for either males or fe 
In addition, the race and sex ratio of each 
follow-up samples did not differ significant 
the original sample. Therefore, the eviden 
cates that each of the follow-up samples, all T 
reduced in size from the original sample, is te 
sentative of the original sample. 


RESULTS 


The correlations between Factors lt 
and Metropolitan Reading Readiness 
or Metropolitan Achievement Test data 
shown in Table 1. 

Within Sample 1, whose membe 
tended prekindergarten in the aca 
year 1968, the Metropolitan Readin 
data for males reveal that there is a § 
cant correlation (at the .01 level) b 
Factor 1 (verbal facility) and level'o! 
ing readiness. For female subjects, th 
significant relations for the first 
tors—Factor 1, verbal facility at 1 
level; Factor 2, coping with anxie 
withdrawal (the less such coping, th 


* When data are analyzed by sex as th 
this paper, Factor 5 (sex) is meaningless. 
is a demographic factor of relative advanti 
and enters into no meaningful relations. 
seems to represent test-specific and erroi 
a disappears when analysis is conducts 

subject. These three factors are men’ 
because they entered into the ori a 
but they are not discussed in the paper. 
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the readiness level at the .01 level); and 
Factor 3, coping with anxiety by aggression 
(the lower the standing in this factor, the 
higher the readiness level, at the .05 level). 
Readiness tests were administered at the 
end of these subjects’ kindergarten year, 
one year after the factor measures had been 
made. 

For the Metropolitan Achievement Test 
data at the end of the first grade, two years 
after factor assessment had been made, the 
following findings are revealed. For males, 
there are no significant correlations between 
achievement and any factor score. For fe- 
males, there are significant correlations be- 
tween achievement and scores for Factors 1 
and 3, both at the .01 level. The higher the 
verbal facility factor score and the lower 
the coping with anxiety by aggression score, 
the higher the achievement standing. 

For the Metropolitan Achievement Test 
data at the end of the second grade, three 
years after factor assessment had been 
made, the following significant correlations 
were found. For males, there are significant 
correlations between achievement scores 
and scores for Factors 1 and 4, both at the 
05 level. The higher the verbal facility fac- 
tor and the higher the alienation score, the 
higher the achievement standing. For fe- 
males, there are significant correlations at 
the .01 level between achievement and Fac- 
tors 1 (verbal facility) and 3 (coping by 
aggression). The higher the verbal facility 
factor score and the lower the coping with 
anxiety by aggression score, the higher the 
achievement standing for females. 

The cumulative percentage and the levels 
of significance of the percentage of the vari- 
ance of the Metropolitan Readiness Test 
and Metropolitan Achievement Test scores 
accounted for by the various factors are 
shown in Table 2, For Metropolitan Readi- 
hess data at the end of kindergarten among 
male subjects, only Factor 1 (verbal facil- 
ity) accounts for a significant (.01) amount 
of the test score variance. Factors 1-4 to- 
gether account for a total of 26.80% of the 
variance. For female subjects, Factors 1 
(Verbal facility), 2 (coping with anxiety by 
Withdrawal), and 3 (coping with anxiety by 
Aggression) each account for & significant 
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TABLE 1 
CORRELATIONS (BY SEX) BETWEEN Factors 1-4 
AND METROPOLITAN READINESS AND 
METROPOLITAN ACHIEVEMENT 
Tests Data 


Factor 
Sex 1 a: 3. 
Vea (Coping with Coping with] | 4. 
faclity | ERAT n 
Metropolitan Readiness Test 
(at end of kindergarten) 

Male .459** | .012 .086 .196 
Female | .564** | .359** | .201* .121 
Metropolitan Achievement Test 
(at end of first grade) 

Male .242 .024 .146 .024 
Female | .346** | .198 .892** .122 
Metropolitan Achievement Test 
(at end of second grade) 

Male .362* .015 .209 .978* 
Female | .552** | .221 .489** | .047 
*p < 05. 
** p< 01. 


amount of the Metropolitan Readiness Test 
variance (all at the .01 level or less), and 
Factors 1-4 together account for 50.87% of 
the Metropolitan Readiness Test data vari- 
ance. 

For Metropolitan Achievement Test data 
at the end of the first grade, no single factor 
accounts for a significant amount of vari- 
ance among the male subjects, and Factors 
1-4 together account for 8.95% of the total 
variance. For girls in the same follow-up, 
both Factors 1 (verbal facility) and 3 (cop- 
ing with anxiety by aggression) account 
significantly (.01 level or less) for variance 
in achievement scores. Factors 1-4 together 
account for 29.57% of the total achievement 
test variance. 

For Metropolitan Achievement Test data 
at the end of the second grade, Factors 1 
(verbal facility) and 4 (alienation) each 
account for a significant (.05) amount of 
achievement test variance for males, and 
Factors 1-4 altogether account for 29.95% 
of the total achievement test variance. For 
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TABLE 2 
PERCENTAGE (BY SEX) OF THE VARIANCE OF THE 
METROPOLITAN READINESS OR METROPOLITAN 


ACHIEVEMENT Tests Data ACCOUNTED 
FoR BY Factors 1-4 


Factor 
Sex 2. Copii 3. 
1, with ^ Coping with| ajé: | Total 
MES [ansety byl anziety by | ation | O) 
Metropolitan Readiness Test 
(at end of kindergarten) 

Male 21.02** | 0.29 1.71 3.78 |26.80 
Female | 31.82** | 9.37** | 9.55** | 0.13 |50.87 
Metropolitan Achievement Test 
(at end of first grade) 

Male 5.87 0.06 2.08 0.34 | 8.95 
Female | 9.91** | 3.47 15.40** | 0.79 |29.57 
Metropolitan Achievement Test 
(at end of second grade) 

Male 11.26* | 1.12 3.32  |14.25*|29.95 
Female | 30.46** | 0.42 15.64** | 0.12 |46.64 
*p < 05. 
1»:« 01. 


females, Factors 1 (verbal facility) and 3 
(coping with anxiety by aggression) each 
account for a significant (.01) amount of 
achievement test variance while Factors 1- 
4 together account for 46.64% of the total 
achievement test variance. 

Where N was sufficient for any meaning- 
ful analysis, the pattern of relations for the 
small Sample 2 was like that for Sample 1. 


Discussion 


The most important result from the pres- 
ent study is that it is possible to predict the 
academic success of inner-city children by 
using the factors from Richards and Mc- 
Candless (1972). The subjects’ scores on cer- 
tain variables when they were in preschool 
were related significantly to their academic 
success when they were in kindergarten and 
first and second grades. The only exception 
to this pattern was for the males at the end 
of the first grade. 

The most important factor for predicting 
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the academic success of both males ar 
males was Factor 1, verbal facility, 
is composed primarily of teacher-rated va 
iables such as verbal skills, quality 
speech, activity versus passivity of 
and a verbal intelligence test (Pea 
Picture Vocabulary Test). Verbal fae 
was consistently the most important fa 
relating to academic success of males 
females. Therefore, an early education 
gram for inner-city children should pr 
bly concentrate on developing the 
bal facility of children. This factor is 
ably especially crucial for males, as 
discussed. 

The next most important result is 
ferences in predictions for males and 
males. The factors were considerably 
effective in predicting academic succ 
males than for females for all three y 
The fact is that fewer of the origi 
variables were related to the academic 
cess of males than females. The gre 
difference in importance of variables í 
curred with Factor 3, coping with anxi 
by aggression. Females who were rated 
their teachers as being cooperative, no 
gressive, and nondisruptive were 
likely to be academically successful. 
factor was never related to academic 
cess of males during the follow-up. 
question that needs to be answered is 
cooperativeness on the part of females 
not males, should be related to their 
demic success. 1 

Another important result was the lost 
prediction from the end of kindergarter 
the end of first grade. Even more unus 
than the loss of prediction at the end o 
grade was the recovery of prediction ati 
end of the second grade. The total D 
achievement variance accounted fo : 
tually surpassed that accounted for at 
end of kindergarten (29.95% at the 
second grade and 26.80% at the end of 
dergarten). The total female achievél 
variance accounted for was almost the $ 
as at the end of kindergarten (46.647 
the end of second grade and 50.87% at 
end of kindergarten). These results 
made even more mysterious by the fin 
that Factor 4, alienation, accoun : 
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significant amount of male achievement 
variance at the end of second grade—the 
only time this factor accounted for a signif- 
icant amount of variance for either males 
or females. First-order correlations were 
mn between the original raw scores for 
males on the different variables that con- 
tributed to Factor 4, and it was determined 
that Children’s Self-Social Constructs Test 
3, identification with mother, was the single 
variable of Factor 4 that accounted for most 
of the achievement variance. The correlation 
was in the following direction: the more the 
alienation from the mother at preschool age, 
the more academically successful the male 
at the end of the second grade. 

Certain questions about different varia- 
bility for boys and girls arise. The differ- 
ences in variability are not significant, al- 
though they are in the direction of the girls 
being more variable than the boys. There 
were no significant differences between dif- 
ferent-raced boys and girls. The correlation 
coefficients clearly indicate continuity of 
tests from the preschool to the early ele- 
mentary school years. 

The recovery of prediction at the end of 
the second grade is most probably not an 
artifact of the reduced size of the follow-up 
sample at the end of second grade, since the 
follow-up sample does not appear to be dif- 
ferent from the original sample on any eru- 
cial dimensions (see Results section for ex- 
planation). Therefore, the question remains 
as to why predictions are better for females 
than males, why these predictions drop 
completely out of sight for males at the end 
of kindergarten, and why alienation from 
the mother is related to academic success of 
males at the end of the second grade. 

A possible explanation is as follows. 
Prekindergarten attendance for females is 
associated with a reduction of Factor 3, 
coping with anxiety by aggression, which 
for females is negatively related to verbal 
facility and, one may be sure, teacher ap- 
proval. Thus, prekindergarten attendance 
may set girls up nicely for publie school 
attendance. Prekindergarten (at least the 
prekindergartens in this study) may have 
the opposite effect on males. Four-year-old 
males are still treated as “boys will be 
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boys, especially cute little boys.” That is, 
they are allowed to be aggressive and even 
encouraged to be creative, spontaneous, and 
curious. Creativity, spontaneity, and curi- 
osity for young males are likely to take the 
form of physical aggression. The kindergar- 
ten experience is more similar to prekinder- 
garten than to later formal schooling, but 
when these children begin the traditional, 
passive-obedience-oriented first grade the 
following year, the females may be much 
better prepared than the males to adjust. 
Young males, especially young black males, 
from first grade on are no longer allowed to 
be cute, aggressive little boys and, there- 
fore, are not able to cope with their anxiety 
in any way that would not affect their ver- 
bal facility. The males may have been 
pushed by the system and their teachers 
back toward passivity and withdrawal as 
coping techniques. As reported in Richards 
and McCandless (1972), being high in this 
factor for boys is associated with being low 
in verbal facility. Thus, no factor scores 
were related to the academic success of 
males at the end of the first grade. It seems 
fair to say that males ran into something of 
a brick wall when they entered the public 
schools. The differential male-female per- 
formance for the subjects in this study only 
began to show up strongly after the first 
grade. There was no difference in the verbal 
facility of the males and females at the end 
of prekindergarten. The passivity-obedi- 
schools also affected the 
the female, but not 
nearly so strongly as it affected the male. 


“public school shoc 
fact that Factors 1 
nificantly to their academic performance. 
The females also recovered from their less 
severe shock and Factors 3 n 
reached their original level of prediction of 
academic success. 

The term “alienation from mother" may 
be unfortunate. The nature of the Chil- 
dren's Self-Social Constructs Test item 
which defines this score is such that, from 
the point of view of face validity, one might 
also say that a high-alienation-from- 
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mother score really means “independence 
from mother." The matter is one that is 
intriguing for further research, as the “al- 
ienation" or "intimacy" items in the Chil- 
dren’s Self-Social Constructs Test have en- 
tered into rather strong and consistent rela- 
tions within the data collected by Richards 
and MeCandless (1972) and by the present 
authors. If the item taps independence from 
mother rather than alienation from mother, 
then the present results are logical enough, 
partieularly with this sample, in which 
there was no father figure in nearly one half 
of the homes for both the black and the 
white children. 

The fact that predictions for females are 
consistently better than for males supports 
the idea that the inner-city male is at a 
greater disadvantage than the inner-city fe- 
male. It seems possible that inner-city fe- 
males ean be more successful academically 
by being cooperative and nice, while this is 
not true for inner-city males. This finding 
may be a result of the long-standing preju- 
dice against the inner-city male. 

A long-range value of the present study 
and of the Richards and McCandless 
(1972) study is that it is now possible to 
construct a test battery composed of the top 
loading variables of the factors that suc- 
cessfully predicted the academic success of 
the inner-city students. This test could be 
used as an aid in detecting those inner-city 
children who are likely to be successful and 
those who are not. Special eurrieulum could 
then be provided to help both of these 
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groups experience sucess in school. This | 
test could be administered by teachers in less. 

than one half hour per child and could be a 
valuable aid in helping the teacher know: 
more about the potential successes and fail- 
ures of her students. This procedure could. 
be especially beneficial for males, but this: 
also seems to call for certain changes in the 
public schools. Inner-city males must be al- 
lowed some method of coping other than | 
being pushed into passivity and obedience 

while they are in school. 
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REASONING AND PRAISE: 


THEIR EFFECTS ON ACADEMIC BEHAVIOR: 


SUZANNE JOHNSON TAFFEL, K. DANIEL O'LEARY,’ Ann 
SANDRA ARMEL 


State University of New York at Stony Brook 


Second-grade children were given reasons or praise for engaging in an 
academic task. The effects of reasoning and praise were then evaluated 
by assessing the subsequent independent maintenance of that task 
behavior. To insure plausibility of the reasons given, children similar 


to the subjects in the main 


study were asked why 


they studied 


arithmetic. "Their reasons were then given to the experimental subjects 
in two separate studies. Verbalizing reasons to the child was as effec- 
tive or more effective than praising him ; the reasoning and praise sub- 


jects worked longer and complete 


d more problems correctly than the 


control subjects. There was no additive effect of combining reasons 


with praise. 


One objective of both parents and educa- 
tors is to teach children to maintain certain 
important behaviors in the absence of im- 
mediate external reinforcement or punish- 
ment. Often they emphasize the importance 
of some activity by giving reasons to the 
child for engaging in the desired behavior. 
As Cheyne and Walters (1970) have 
Pointed out, reasons provide at least two 
fairly distinct kinds of information. First, 
they delineate what behavior should be car- 
tied out, and second, they describe why that 
behavior should be performed. 

While reasons, as described here, are fre- 
quently used as a means of persuading chil- 


1The assistance of Warren Hochberg, principal, 
and the second-grade teachers of Minnesauke 
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The project presented herein was performed 
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ealth, Education, and Welfare. The opinions ex- 
Pressed herein, however, do not necessarily reflect 
the position or policy of the US. Office of Edu- 
ee and no official endorsement by the US. 
fice of Education should be inferred. 
o Requests for reprints should be sent to K. D. 
Leary, Department of Psychology, State Uni- 
Nb ot N Stony Brook, New York 


dren to initiate or sustain certain activities, 
there are only a few experimental investi- 
gations attesting to their effectiveness. Fur- 
thermore, only one study (Staub, 1972) has 
investigated the effects of reasons alone 
rather than in conjunction with some other 
treatment procedure. Two studies, for ex- 
ample, involved reasoning in combination 
with punishment. In the first, Cheyne and 
Walters (1969) told some subjects that they 
were not to play with certain toys because 
the experimenter did not have any more 
like them and he was afraid that they 
might become worn out or broken. Other 
children were simply told not to touch or 
play with the toys. In a subsequent resist- 
ance-to-temptation session in which sub- 
jects were left alone with the prohibited 
toys, those children who had been given a 
reason for not playing with the toys 
touched them significantly less often and 
for shorter periods of time. A similar find- 
ing was reported by Cheyne (1969), who 
verbally punished some children for select- 
ing a certain toy by saying "that is bad." 
Other children were also told ^you should 
not play with that toy." Still other children 
were given 8 reason for the prohibition, 
“that toy belongs to someone else.” When 
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left alone with the toy, children in the lat- 
ter two conditions deviated later, less often, 
and for less time than children in the first 
condition. However, only for third-grade 
children was the reason for the prohibited 
behavior effective in increasing resistance to 
deviation. Apparently, for the kindergarten 
subjects involved in this study, the reason 
used was not capable of motivating resist- 
ance behavior either because this particular 
reason did not make "sense" to these chil- 
dren or perhaps because reasons in general 
are not effective with this age group. In a 
third investigation, by Elliott and Vasta 
(1970), sharing behavior was significantly 
increased by adding the reason, “if you do 
something nice for someone else, it means 
you are a good boy,” to a treatment condi- 
tion involving the modeling of sharing be- 
havior with a subsequent vicarious reward. 
Finally, only in an experiment by Staub 
(1972) was it demonstrated that reasons 
alone could effectively increase junior high 
school students’ choices for a larger delayed 
reward over a smaller immediate one. 

The present investigation deviated from 
previous research in at least two respects. 
First, it employed reasons to focus on the 
maintenance of academic behavior rather 
than the resistance to temptation, sharing, 
or delay of gratification which were the tar- 
get behaviors of the studies cited above. Sec- 
ond, unlike the Cheyne and Walters (1969), 
Cheyne (1969), and Elliott and Vasta 
(1970) experiments which used reasons in 
conjunction with punishment or modeling, 
the present study attempted to delineate 
the effects of reasons alone. 

In the present study, in addition to rea- 
soning, a second experimental group, in- 
volving praise statements for engaging in 
the academic target behavior, was also 
used. Since a number of investigations have 
previously demonstrated the effectiveness of 
praise upon the study behavior of elemen- 
tary school children (Becker, Madsen, Ar- 
nold, & Thomas, 1967; Cossairt, Hall, & 
Hopkins, 1973; Hall, Lund, & Jackson, 
1968; Madsen, Becker, & Thomas, 1968; 
Thomas, Becker, & Armstrong, 1968) a 
comparison between the praise and reason- 
ing experimental conditions seemed to be a 
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particularly appropriate means of measur- 
ing the relative effectiveness of reasons on 
an academic task; that is, reasons are being 
compared to a praise procedure because the 
latter has been shown to be effective in a 
host of studies. 

The overall purpose of this investigation 
was to assess the effectiveness of reasons in 
children’s maintenance of an academic task 
by comparing such a procedure to both an 
experimental group that was praised for 
academic behavior and to a control group 
that was given neither praise nor reasons 
for engaging in the academic task. 


EXPERIMENT 1 


Method 


This study was a 2 X 2 factorial design in which 
some children were given reasons (reasoning con- 
dition) for engaging in the academic behavior, 


some were given praise (praise condition) for | 


doing 80, some were given both reasons and praise 


(reasoning and praise condition) and some were | 


given neither praise nor reasons (control con- 
dition) for working at the academic task. 


Subjects 


The subjects were 40 second-grade children 
from & public elementary school, | 
females. The subjects were white, middle-class 
suburban residents with average intelligence. The 
investigators obtained written permission from 
each child's parents prior to the child's involve 
ment in the study. 


Procedure 


The subjects were randomly assigned to one of 
the four conditions with T 
equal number of males and females were invo 4 
in each condition. Each child was seen FF 
dividually by the experimenter throughout 
treatment or control procedure. 
conducted in the fall of the year. ER 
Each subject, upon entering the experimel 
room, was seated at a desk and given & Me 
booklet of grade-appropriate arithmetic pro! idi 
which began with very easy problems that bs 
progressively more difficult. The experim i 
asked the subject to begin working on the is i 
lems and spent an initial 5 minutes sitting M 
the child as he worked. At the end of wa M. 
the experimenter told the subject that she ha F 
desk in another part O 


ex 
and could stop whenever he wanted me Ta 


20 males and 0 | 


the stipulation that a | 


The study wa 


ex 
jd he could 
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! that he wished to stop working on the problems. 


At that time, the experimenter escorted the sub- 
ject back to his class. If the subject did not in- 
dependently stop work after 45 minutes, the ex- 
perimenter terminated the session. The dependent 
measures used were the time the subject spent 
independently working on the problems and the 
number of problems the subject completed cor- 
rectly after the experimenter had left the subject 
to work on his own. 

During the initial five minutes of the experi- 
mental session, the 10 children in the reasoning 
condition were given four reasons by the experi- 
menter for doing arithmetic. The reasons used 
were obtained from another group of second 
graders in a different public school within the 
same school district who answered the question, 
“why should we study arithmetic?” The ex- 
perimenter verbalized the following reasons at one- 
minute intervals while the subject was working 
on the problems: 


1. It’s important to learn arithmetic so that 
you will know how much change you should 
get back when you buy something at the store. 
That way you won't get cheated. 

2. It’s important to learn arithmetic so that 
people will think you are smart and not dumb. 

3. It's important to learn arithmetic so that 
you can measure the amount of flour you need 
to bake a cake, or so that you can measure the 
amount of wood you need to build a shelf. 

4. It's important to learn arithmetic because 
you need it for lots of jobs that you might want to 
get when you grow up; you need arithmetic to 
be a scientist, a teacher, a carpenter, mechanie, 
a businessman, and for many, many other jobs 
too. 


The 10 children in the praise condition received 
praise instead of reasons from the experimenter 
during the initial five minutes of the session. The 
praise statements consisted of comments such as 
„JOU are doing very well,” “good boy (girl), 
‘you're doing just fine,” “good job.” The com- 
ments were also given at one-minute intervals 
while the subject was working on the problems. 

In the reasoning and praise condition, the 10 
subjects were given both the praise statements and 
the reasons at one-minute intervals during the in- 
itial five minutes of the session. : 

Finally, control condition subjects were given 
neither praise nor reasons for working on the prob- 
lems. The experimenter simply asked the subject 
to begin work on the problems and then sat with 
the subject for the initial five minutes of the ses- 
sion, making no comments on his performance. 
Control subjects, like experimental subjects, were 
then told they could work on the problems for as 
long as they wished and could stop whenever they 
wanted to. 

It was predicted that the subjects in both the 
Teasoning and praise conditions would work longer 
and complete more problems correctly than con- 
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trol subjects. No differential prediction was made 
concerning the reasoning versus the praise sub- 
jects. 


Results 


Due to excessively large and heteroge- 
neous variances, Mann-Whitney U tests 
were executed. On number of problems 
completed correctly, no significant differ- 
ences between any of the conditions were 
found. On the time data, however, the rea- 
soning condition was significantly better 
than the control condition (U = 23, p < 
.025, one-tailed test); the praise condition 
was also significantly better than the con- 
trol condition (U — 18, p « .01, one-tailed 
test), and the reasoning and praise condi- 
tion was significantly better than the con- 
trol condition (U = 185, p < 0l, one- 
tailed test). There were no significant dif- 
ferences between any of the experimental 
groups. 

Because the children were working on the 
problems when the reasons and praise state- 
ments were given, there was some question 
concerning how well the subject was attend- 
ing to the experimenter’s verbalizations. To 
insure the subject’s attention to the experi- 
menter’s comments, a second experiment 
was performed. 


EXPERIMENT 2 


Method 


Since there appeared to be no additive effect of 
combining reasons with praise in experiment, 1, 
only three groups were used in this investigation: 
the reasoning condition, the praise condition, and 
the control condition. 


Subjects 


The subjects were 30 second graders, 15 males 
and 15 females, from the same public elementary 
school used in experiment 1. Parental consent was 
again obtained prior to each subjects involvement 
in the study. This study was conducted in the 


spring of the year. 


Procedure 

The general procedure remained the same as in 
experiment 1; however, two important: changes 
were made. First, the experimenter no longer 
verbalized at one-minute intervals during the in- 
itial five minutes of the reasoning and praise con- 
ditions, Instead, the subject was given a sheet with 
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five easy problems on it. Upon completion of the 
problems, to insure the subject’s attention, the ex- 
perimenter removed the sheet and verbalized either 
a reason or a praise statement, depending upon the 
condition. The experimenter then handed the sub- 
ject a second sheet of five easy problems. Again, 
upon completion of the problems, the experimenter 
removed the sheet and verbalized a second reason 
or praise statement. This procedure was continued 
until the subject had heard either four reasons or 
four praise comments from the experimenter. Con- 
trol subjects were given each of the four problem 
sheets in succession with no verbal comments 
from the experimenter. All subjects were then 
given a large booklet of grade-appropriate arith- 
metic problems of increasing difficulty with in- 
structions to work on the problems for as long as 
they wished, but the experimenter did not move 
to a desk in the experimental room to “work” as 
in experiment 1. Instead, the experimenter left the 
room entirely and asked the subject to come to 
the door and get the experimenter when he was 
finished working on the problems. It was hoped 
that this second procedural change—removing the 
experimenter from the experimental room—would 
reduce the amount of time the subjects voluntarily 
worked on the problems and, consequently, help 
eliminate the large and heterogeneous variances 
found in experiment 1. 

Both the number of problems completed cor- 
rectly and the time the subject spent working on 
the problems while he was left alone in the experi- 
mental room were used as the dependent measures. 

It was again predicted that the reasoning and 
praise subjects would work longer and complete 
more problems correctly than the controls; no dif- 
ferential prediction was made concerning the rea- 
soning versus praise subjects. 


Results 


The data did not exhibit the excessively 
large and heterogeneous variances found in 
experiment 1. Consequently, an analysis of 
variance was executed on both the problem 
and time data. The treatment effect was 
significant for both number of problems 
completed correctly (F = 6.47, p < .01) 
and time (F — 7.04, p « .005). Subse- 
quently, Mann-Whitney U tests were exe- 
cuted on the various comparisons. On the 
problem data, the reasoning condition was 
significantly better than the control condi- 
tion (U = 16.5, p < .01, one-tailed test) ; 
the praise condition was also significantly 
different from the control condition (U — 
26, p « .05, one-tailed test); and the sub- 
jects in the reasoning condition did better 
than the subjects in the praise condition (U 
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= 27, p < .10, two-tailed test). On the time 
data, only the reasoning condition was sig- 
nificantly different from the control condi- 
tion (U = 14, p « .01, one-tailed test), 
However, there was a tendency for praise 
condition subjects to work longer than the 
controls (U = 28, p < .10, one-tailed test), 
and again, there was a tendency for the 
reasoning condition subjects to do better 
than the praise condition subjects (U = 28, 
p < .12, two-tailed test). 
Discussion 

If one is concerned with a child's inde- | 
pendent maintenance of academic behavior, 
the results of these two experiments indi- 
cated that verbalizing reasons to a child for 
engaging in an academic task is as effective 
or more effective than giving him praise for | 
doing so and is certainly more effective 
than giving him no reasons or praise state- 
ments at all. The finding that reasons are at 
least as effective as praise in maintaining 
academic behavior has obvious interesting 
implications for the classroom proper. How- 
ever, it should be noted, that while praise 
has been repeatedly demonstrated to be an 
effective tool for modifying behavior within 
the classroom itself (O'Leary & O'Leary, 
1972), reasons have been experimentally in- 
vestigated only in the context of relatively 
brief single sessions. It seems important t0 
extend investigations to repeated sessions m 
the natural classroom environment. A com- 
parative study of the effects of praise and 
reasons in a field setting, for example, 
might prompt some rapprochement between 
two rather divergent streams of research # 
child psychology, namely, the operant all 
the cognitive approaches. 
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An instrument was developed for determining whether or not an 
individual learns relatively better from pictures than words. Based on 
this instrument, repeated classifications of elementary school children 
were found to be quite consistent. Moreover, when applied to the com- 
prehension of prose materials, the instrument served to identify 
those children for whom self-generated visual imagery would con- 
stitute an effective organizational strategy. 


In a recent investigation, Levin, Rohwer, 
and Cleary (1971) demonstrated that chil- 
dren could be reliably classified on the basis 
of whether they learned relatively better 
from pictures as opposed to words. Follow- 
ing the administration of a paired-associate 


. list that contained both unlabeled picture 


pairs and aurally presented word pairs, ele- 
mentary school children were grouped ac- 
cording to whether their paired-associate 
recall resulted in relatively large or rela- 
tively small picture-word differences. It 
was found that these initial classifications 
of the children tended to be fairly stable 
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over a two-day period when a second (par- 
allel) paired-associate task was adminis- 
tered. More recently, Mallory (1972) has 
reported a similar finding based on different 
materials and classification procedures. 

The observed stability of individual dif- 
ferences in the Levin et al. (1971) study 
was actually the serendipitous by-product f 
of an experiment with an unrelated objec- 
tive, conducted by Rohwer, Ammon, Su- 
zuki, and Levin (1971). Thus, while the 
Levin et al. result turned out to be interest- 
ing in its own right, the original experiment 
had not been planned to answer questions 
directly related to individual differences. l 
Consequently, all analyses in the Levin et 
al. article were admittedly post hoc, and all 
conclusions were admittedly speculative. 
Such is not the case here. Based on thé 
previous findings, we sought to demonstrate 
that individual-difference-related picture- 
word effects obtained on a paired-associate 
learning task are not only reliable but also 
applicable to more schoollike activities sue 
as reading. dis 

Specifically, the dual objectives of 5 
study were (a) to develop a paired-ass0er 
ate learning task (ideally, group admins: 
tered) consisting of both pictorial and ver 
bal items, from which different types ^, 
learners could be reliably identified, 9? 
(b) to determine whether such informa 
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could be applied to the learning of prose 
materials. 


EXPERIMENT 1 


Method 


Construction of the learning task. From a popu- 
lation of over 100 line drawings of objects (animate 
and inanimate) familiar to young children, 64 
were selected to create two 16-pair lists. Only pic- 
tures with labels for which there was consensus 
(based on a pilot testing of first and sixth graders) 
were included. The 64 pictures were nonsystem- 
atically assigned to the two lists, subject to the 
following three restrictions: (a) Approximately 
equal numbers of animate objects were assigned 
to each list; (b) objects that were conceptually 
similar (e.g, knife-fork, bus-truck) were assigned 
to different lists; and (c) objects whose labels were 
acoustically similar (e.g. bat-cat, tire-fire) were 
assigned to different lists. 

Within each 32-item list, the 16 pairs were non- 
systematically formed subject to the following two 
restrictions: (a) Objects that were obvious asso- 
ciates of one another (eg. bus-tire, doll-house) 
were assigned to different pairs; and (b) objects 
were paired only if there was a possibility of con- 
structing a plausible interaction between the two. 
Following this, 8 of the pairs in each list were 
randomly designated as picture pairs and 8 as word 
pairs. Item pairs were then randomly ordered 
within each list such that different item types 
(pictures or words) appeared in the first two list 
positions, as well as in the last two; in addition, 
no more than two consecutive pictures or words 
Were permitted, These measures were taken as 
precautions against primacy, recency, and response- 
set effects. Four such “random” orders of the 
items were constructed, two for study trials and 
two for test trials (stimulus terms only), in order 
to prevent serial learning of the responses. Succes- 
sive revisions and replacements of items (suggested 
by item analysis), as well as revisions of the 
instructions and procedures, were conducted to 
improve the task's parallel forms (separated by 
24 hours) reliability. This included the addition of 
a third study-test cycle (actually a repetition of 
the first study-test trial items). 

Both the pictures and typewritten labels of 
the pictures were photographed and mounted on 
slide transparencies (1 adjacent pair of pictures 
or words per slide), Pictures and words were placed 
In their predesignated positions within the 16-pair 
mixed-list sequences. : 

In the initial pilot testing of the instrument, 
both first and sixth graders had been tested in- 
dividually. However, the final version seemed to 
lend itself to group administration. Accordingly, 
(a) only children in Grades 4-6 were included as 
— 

' The second restriction was included for rea- 
sons unrelated to the experiments reported here. 
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subjects; and (b) individual subject response 
booklets were used on each test trial. In these 
booklets were printed the labels of the 16 stimulus 
terms, with subjects required to supply the missing 
response terms. 

Procedure. The subjects were run in groups (in- 
tact classrooms). After distributing the response 
booklets, the experimenter informed the subjects 
of their task. Items for the first study trial of Form 
A were then projected onto a screen at the front 
of the room, five seconds per pair. Following the 
last study pair, subjects (having been reminded 
to work quickly) were allowed 1% minutes to 
complete the first page of their (three-page) re- 
sponse booklets. Two additional study-test cycles 
were then provided. The next day, the experi- 
menter returned unannounced and administered 
Form B of the task in similar fashion. 

Subjects. Fifty-four subjects from two fourth- 
grade classrooms in a semirural midwestern com- 
munity were administered the task. However, due 
to second-day absences and obvious cases of non- 
compliance with task instructions, only the data 
of 43 subjects were usable. 


Results and Discussion 


Subject classifications based on the learn- 
ing of pictures and words. It should be re- 
called that a primary objective of the pres- 
ent experiment was to corroborate the 
Levin et al. (1971) finding that an individ- 
ual’s relative performance based on pictures 
and words was reliable. In the earlier study, 
it had been hoped that some subjects would 
learn pictures better than words, while the 
converse would be true for other subjects. 
However, pictures led to superior learning 
for almost all subjects and as a result, clas- 
sification of learner types was made on the 
basis of a subject’s picture-word difference 
being relatively (i.e, as compared with 
other subjects) large or small. As was ar- 
gued by Levin et al., this kind of classifica- 
tion procedure created some interpretive 
difficulties, since with such a system & sub- 
ject’s performance level and pattern may be 
confounded. Consequently, an alternative 
system for classifying subjects was incorpo- 
rated here. : € 

Ideally, we hoped to identify four differ- 
ent types of learners: (a) subjects who per- 
formed relatively well on both pictures and 
words (high pictures, high words), (b) sub- 
jects who performed relatively poorly on 
both pictures and -words (low pictures, low 
words); (c) subjects who performed rela- 
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tively well on pictures but relatively poorly 
on words (high pictures, low words); and 
(d) subjects who performed relatively 
poorly on pictures but relatively well on 
words (low pictures, high words). In ac- 
tuality, however, learners of the fourth type 
were difficult to find on this task (only a 
handful out of 288 subjects in the Levin et 
al., 1971, study and 2 out of 43 subjects 
here). Fortunately even without such sub- 
jects, some interesting outcomes may be an- 
ticipated. Consider the high-picture, low- 
word subjects, for example. When learning 
pictures, their performance should resemble 
that of high-picture, high-word subjects 
rather than that of low-picture, low-word 
subjects; whereas when learning words, 
their performance should resemble that of 
low-picture, low-word subjects rather than 
that of high-picture, high-word subjects. If 
true, an Aptitude x Treatment interaction 
of the kind described by Levin (1972) 
would be produced, such that for some chil- 
dren it is largely a function of the nature of 
materials (here, pictures or words) that de- 
termines whether or not they will display 


effective learning. The stability of this type ` 


of interaction was what interested us here. 
To investigate this possibility, we classi- 
fied subjects according to whether or not 
they learned relatively well from pictures: 
Those who scored above the mean for pic- 
tures were designated high-picture subjects, 
while those below the mean were designated 
low-picture subjects. Within the high-pic- 
ture classification, subjects were divided 
into two approximately equal-sized groups 
on the basis of their performance on words 


TABLE 1 


MEAN PERFORMANCE ON PicTURES AND Worps 
BY THE THREE LEARNER CLASSIFICATIONS, 
EXPERIMENT 1 


Measure cien Se aie Pise 
(n = 12) (n = 9) (n = 20) 

Form A 

Pictures 16.50 13.89 8.45 

Words 10.25 3.44 3.50 
Form B b 

Pictures 15.25 14.44 9.40 

Words 10.17 5.44 5.40 
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(either high word or low word). As was 
indicated earlier, when the same criteria for 
words were applied to low-picture subjects, 
only two subjects were found to be low- 
picture but high-word learners. The results 
of these two subjects were not included in 
the classification stability analysis. The 
number of subjects represented in the three 
learner classifications (high picture, high 
word; high picture, low word; and low pic- 
ture, low word) as well as the corresponding 
picture and word means are shown in Table 
1. Statistical analysis of Form A (the clas- 
sification list) data revealed significant dif- 
ferences in the three groups’ performance 
on both pictures (F = 38.42, df = 2/38, p 
< .001) and words (F = 58.92, df = 2/88, 
p < .001). The nature of these differences 
differed with the type of item considered, 
however. Consistent with the desired classi- 
fication, Scheffé post hoc comparisons (« = 
05) revealed that (a) on pictures, both 
high-pieture, high-word and high-picture, 
low-word subjects differed significantly 
from low-picture, low-word subjects, though 
not from each other, while (b) on words, 
both high-pieture, low-word and low-pic- 
ture, low-word subjects differed signifi- 
cantly from high-picture, high-word sub- 
jects, though not from each other. E. 

Criterion list performance based on tr 
tial classifications. Considering all 43 sub- 
jects (including the two low-picture, high: 
word subjects), the parallel forms reliabil- 
ity—based on total (picture plus word) 
scores on Form A and Form B separated by 
24 hours—was found to be .76. More impor- 
tantly, however, when performance on pio- 
tures and words was broken down according 
to the initial classification groups, esse? 
tially the same pattern was produced 8$ 
with the classification list itself (cf. Form 4 
and Form B results in Table 1). Statisti- 
cally, the previously reported results were 
completely substantiated are Scheffé po 
hoe comparisons (a = .05). 

Until PE we have considered only H 
average performance of subjects 1 E 
three classification groups. Of greater Mee 
est, however, is whether individual sub) E 
who were classified in a particular way 
Form A would have been classified in 


mw 


LEARNING FROM PICTURES AND WORDS 


same way on a different occasion. To an- 
swer this question, we classified subjects ac- 
cording to their Form B performance, fol- 
loving the procedures used for Form A. The 
combined Form A — Form B classifications 
may be found in Table 2, which shows that 
9 out of 12 (75%) high-picture, high-word 
subjects, 7 out of 9 (78%) high-picture, 
low-word subjects, and 14 out of 20 (70%) 
low-picture, low-word ‘subjects were simi- 
larly classified on the two occasions. A test 
of the association in these data (minus the 
three low-picture, high-word subjects on 
Form B) was significant (x? = 34.57, df = 
4,p < .001), with the degree of predictive 
association as reflected by the asymmetric 
Goodman-Kruskal measure, às (Hays, 


1963) , being .62. 


Rejecting an alternative explanation. The 
data just reported, as well as those of Levin 
et al. (1971), would seem to support the 
argument that different “learner types” 
exist and that they can be identified. How- 
ever, because of the unique mixed-list na- 
ture of the learning task employed in these 
experiments, an alternative interpretation 
seems reasonable. Rather than explaining 
the results in terms of individual-difference- 
related learner types, it is possible to ex- 
plain them in terms of individual-differ- 
ence-related "learning styles." In particular, 
when both pictures and words are presented 
ina single list, the contrast produced enables 
subjects to attend to one class of materials 
(pictures) at the expense of the other 
(words), Consequently, the high picture 
learning of high-picture, low-word subjects 
may have resulted simply from an unequal 
division of attention between the two classes 
of material. A second experiment was con- 
ducted to evaluate this possibility. 

The subjects consisted of sixth graders 
from a semirural midwestern community. 
Following the procedures of Experiment 1, 
we administered Form A of the learning 
task as a 16-pair mixed list which con- 
tained both picture and word pairs. On the 
basis of the children’s performance, we were 
able, as previously, to identify high-picture, 
high-word subjects, low-picture, low-word 
subjects, and high-picture, low-word sub- 
Jects. As expected, even though the perform- 


S 


299 


' TABLE2 


ConRESPONDENCE BETWEEN FORM A AND 
Form B SUBJECT CLASSIFICATIONS 


Form A 
Form B 


1. High picture, high word 9 1 1 
2. High picture, low word 1 ri 2 
3. Low picture, low word 2 1 14 


Noie. Three subjects who were low-picture, 
low-word learners on Form A would have been 
classified as low-picture, high-word learners on 
Form B. 


ance of these latter children was closer to 
that of high-picture, high-word subjects on 
the picture pairs, it was closer to that of 
low-picture, low-word subjects on the word 


pairs. 

On the following day, the children were 
administered a second learning task which 
differed from the previous second task in 
one important respect: Whereas Experi- 
ment 1 utilized a parallel form of the mixed- 
list classification instrument, here we con- 
structed homogeneous (picture or word) 
lists incorporating the same paired mem- 
bers. Thus, about half of the children (two 
classrooms) received a list of 16 picture 
pairs on the second day, and the others (two 
classrooms) received a complementary list 
of 16 word pairs. If the alternative (learn- 
ing style) interpretation of the previous 
data is valid, then with no opportunity for 
high-picture, low-word subjects to divide 
their attention unequally between pictures 
and words (since subjects received a list — 
containing only one class of materials), dif- 
ferent performance profiles would not be 
expected on the two list types. On the other 
hand, our original (learner type) inter- 
pretation would have high-picture, low- 
word subjects performing either well or 
poorly, depending on whether they were 
administered the picture or the word list, re- 
spectively. 

As is evidenced in Fi when 
pairs were presented, both high-picture, 
low-word and low-picture, low-word sub- 
jects did very poorly; however, when pic- 
ture pairs Were presented, the performance 
of high-picture, Jow-word subjects improved 


gure 1, when word 


L. 
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(erm em) nna) (e (nns) n9) 
Words. Pictures 
Fravre 1. Mean performance (maximum = 48) 
by the three learner types on the homogeneous 
word and picture lists of the second day. (Abbrevia- 
tions: Hi P = high picture; Hi W = high word; Lo 
P = low picture; and Lo W = low word.) 


dramatically while the performance of low- 
picture, low-word subjects was not mate- 
rially affected. Thus, given that profiles 
paralleling those of Experiment 1 were ob- 
tained with independent homogeneous lists, 
a “learning style” interpretation of those 
data must be rejected. Until proven other- 
wise, our “learner type” interpretation still 
lives—a conclusion further supported by 
the data of Experiment 2. 


EXPERIMENT 2 


A second (equally important) concern of 
the present research was that the previously 
described learner type classifications would 
relate to performance in learning tasks 
. other than parallel versions of the paired- 
associate classification task. In particular, 
reading comprehension was selected as a 
likely candidate since it has been argued 
that (a) experimenter manipulations seem 
to affect paired-associate learning and 
reading comprehension in similar ways, and 
(b) similar processes may well underlie 
each (Levin, 1972). 

For example, it is well documented that 
in a paired-associate task, picture pairs are 
more easily learned than word pairs (cf. 
Reese, 1970). It has likewise been shown 
that a pictorial representation of textlike 
materials is more easily learned than is a 
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printed representation of the same mate- 
rials (Matz & Rohwer, 1971). The same 
comparison may be made with regard to the 
role of subject-generated visual imagery in 
paired-associate learning and in reading 
comprehension, That is, with relatively 
concrete materials, the generation of imag- 
ined visual relationships has been found to 
facilitate both types of performance (Levin, 
1972). 

In a recent experiment, Levin (1978) 
demonstrated that while subject-generated 
visual imagery improves reading compre- 
hension in general, the effectiveness of such 
a strategy depends largely upon the prereq- 
uisite skills of the student. Specifically, 
fourth graders who could decode and derive 
meaning from individual words (but could 
not effectively organize words to derive 
meaning from sentences) benefited greatly 
from instructions to generate organizational 
images on a reading task. As was predicted, 
however, children who were experiencing 
decoding and/or vocabulary problems at 
the word level did not benefit from such an 
imagery strategy. 

An analogy might be drawn vis-à-vis the 
focus of the present research. Suppose that 
children are classified according to the sys 
tem in Experiment 1, in which high-picture, 
low-word subjects were those children who 
learned relatively well from pictures but 
not from words. An intriguing possibility 8 
that their comprehension of textlike (ver- 
bal) materials might be improved through 
the substitution or addition of pictures. 
the other hand, this would not be expecte 
for low-picture, low-word subjects who 
have difficulty learning from pictures 8$ 
well as from words. p 

In this experiment, we wanted to see if the 
three learner type classifications diff 
with respect to reading comprehension un 
naturally occurring situations (i.e., in 
absence of experimenter-suggested po 
gies). In addition, however, some of 
subjects from each classification group xm 
instructed to employ a visual imag 3 
strategy while reading, with the expectatio 
that only the performance of those subj 
who learn relatively well from pict i 
(that is, high-picture, high-word and 
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appropriate for children of ages 9-12 were con- 


structed following Matz and Rohwer (1971) and 


picture, low-word subjects but not low-pic- 
ture, low-word subjects) would be en- 
hanced. 
Method 

Reading task. Two 10-sentence reading passages 


i OG eee RR 


Levin (1973). The two passages (one comparing 
two kinds of monkeys and the other, two cars) had 
been used in research recently reported by Levin 
and Divine-Hawkins (in press). Each sentence 
was photographed and mounted on slide trans- 
parencies, 1 to a slide. Ten questions based on each 
passage were constructed to assess comprehension. 

Subjects. Children from three fourth-grade 
classrooms in a middle-class midwestern commu- 
nity participated in the experiment. 

Procedure. Form A of the Experiment 1 group- 
administered learning task was presented to chil- 
dren in each classroom, following the procedures 
previously described. The next day subjects were 
called out of their rooms individually and were 
given the two reading passages. Additionally, half 
of the subjects were given a visual imagery strategy 
prior to reading the passages. That is, they were 
told to make up pictures in their minds about what 
was happening in each story while they read it. 
The subjects were then provided with a sample 
sentence (with subjects in the imagery condition 
given practice in generating images), followed by 
an oral question about it. The first passage was 
presented on a slide projector, one sentence every 
eight seconds. Following the last sentence, the ex- 
perimenter asked the 10 questions about the pas- 
Sage in a random order (i.e. the questions, which 
were typed on index cards, were shuffled anew 
for each subject). No reading was required of the 
subject during these oral questions, each of which 
could be answered in short phrases. The second 
passage and corresponding questions were then 
presented in similar fashion. After the second set 
of questions, the experimenter queried the subject 
regarding his perceived passage difficulty and his 
interest in the two passages. The subject was also 
asked to indicate how frequently visual images 
Came to mind while he was reading the passages. 
Four-point ordinal scales were used to quantify 
Subjects’ responses to each question.* 


Results 


Subject classifications on the learning 
task paralleled those of Experiment 1 and 


` resulted in the identification of 24 high- 


Picture, high-word subjects, 13 high-picture, 
low-word subjects, and 20 low-picture, low- 
~ 

‘Unfortunately, the data derived from these 
Questions were uninformative and, therefore, are 
not discussed further. 
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TABLE 3 


Mean PERFORMANCE ON Pictures AND Worps 
BY THE THREE LEARNER CLASSIFICATIONS, 


EXPERIMENT 2 


igh picture, | High pi Low picture, 
ny ee 


Measure high word low word 
(n = 24) (n = 13) (n = 20) 
Form A 
Pictures 15.58 15.00 7.25 
Words 12.75 4,15 3.05 


word subjects. The mean learning of pic- 
tures and words by subjects in these three 
groups is presented in Table 3. As was true 
in Experiment 1, these classifications re- 
sulted in comparable performance for high- 
picture, low-word subjects and high-picture, 
high-word subjects on pictures and for 
high-picture, low-word subjects and low- 
picture, low-word subjects on words, 

Since the experimenter assigned the sub- 
jects randomly to the two reading condi- 
tions without knowledge of their particular 
learner type classifications, disproportion- 
ate numbers of subjects ended up in the two 
conditions from one learner type to the 
next, as indicated in Figure 2. In scoring 
the reading performance data, nothing more 
than a synonymic deviation from the cor- 
rect response was accepted. Analysis of the 
data (which represent the mean number of 
No. Correct. 


Se 


B= 
Ss 


HIP, HIW HIP, LOW LoP, LOW 
(Wem) (e) (Woo) 
Imagery 
Ficure 2. Mean performance on the reading 
task by the three learner types under different 
instructional conditions. (Abbreviations: Hi P — 
high picture; Hi W — high word; Lo P — low 
picture; and Lo W — low word.) 
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correct responses, out of 20, on the two pas- 
sages) in Figure 2 was performed using 
least squares techniques for the effects of 
interest. In order to compare the reading 
performance of the three learner types un- 
der each instructional condition (regular 
and imagery), learner types were nested 
within these two conditions. 

As may be seen in Figure 2, for the sub- 
jects given regular instructions prior to 
reading the passages, differences among 
learner types were small and statistically 
nonsignificant (F = 1.81, df = 2/51, p > 
.10). However, when imagery instructions 
were employed, significant performance dif- 
ferences among learner types were detected 
(F = 13549, df = 2/51, p < .001). Scheffé 
post hoc comparisons (a = .05) confirmed 
the visual impression obtained from Figure 
1: High-picture, high-word subjects and 
high-picture, low-word subjects each dif- 
fered significantly from low-picture, low- 
word subjects, though not from each other. 

As a main effect, imagery instructions 
were not facilitative (F < 1), the explana- 
tion of which may be inferred from Figure 
2: While the performance of good picture 
learners (high-picture, high-word subjects 
and high-picture, low-word subjects) in- 
creased when imagery instructions were 
employed, the performance of poor picture 
learners declined. A further consideration of 
this result is given in the following section. 


GENERAL Discussion 


By approaching the “learner types" prob- 
lem in a manner different from Levin et al. 
(1971), in Experiment 1 we were similarly 
able to detect reliable individual differences 
in children's ability to learn pictorial and 
verbal materials. Some children learn both 
well; some learn both poorly. However, for 
many children whether they are regarded as 
learners or nonlearners depends on whether 
the materials are pictures or words. It is for 
just these children that previous diseussions 
of ordinal aptitude by treatment interac- 
tions are relevant (Levin, 1972, 1973). 

Psychometrically speaking, it is impor- 
tant to note that the Experiment 1 classifi- 
cations were sufficiently potent to overcome 
the counteracting influences of statistical 


J. R. LEVIN, P. DIVINE-HAWKINS, S. M. KERST, AND J. GUTTMANN 


regression (on Form B). Practically speak- 
ing, it is also important that learner type 
diagnoses may be couched within a group- 
administered task. At the same time, one 
should not lose sight of the fact that the 
Experiment 1 data are based on only a 24- 
hour separation. It would certainly be fruit- 
ful to determine the limits of the instru- 
ment's long-term stability. 

In Experiment 2, we capitalized on the ; 
learner type classifications to assess a 
child's performance on a reading task. 
While minimal differences among groups 
were discovered on the reading task per se, 
when a visual imagery strategy was in- | 
duced in the children prior to reading, sub- . 
stantial differences among learner types 
were observed. What we found was that 
children who do not learn appreciably bet- 
ter from pictures than from words (low- 
picture, low-word subjects) did not benefit 
as much from the imagery strategy as did 
those who do (high-picture, low-word sub- 
jects). In fact, as Figure 2 suggests, im- | 
agery instructions may well have been det- 
rimental to the reading comprehension of 
low-picture, low-word subjects. Assuming 
that such subjects have developed alterna- 
tive (nonimagery) strategies for success- 
fully processing prose materials under natu- 
ral conditions (cf. the bars to the left in 
Figure 2), this result is not totally surpris 
ing. 

Just as it has been previously demon- 
strated that children first must comprehend 
individual words before they can use visu 
imagery to their advantage while reading 
(Levin, 1973), the present, research adds to 
this finding by suggesting that certain 
Learning Modality x Reading Strategy m- 
teractions may also have to be considered. 
Of late, visual imagery has been heralded 
as an effective organizational strategy for 
relatively concrete prose materials de 
1972). However, when its success cleany 
depends on the capabilities of the use ca 
veats about its nonuniversality cannot 
echoed too loudly. 
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Tape recordings were made of six white and six black ninth-grade boys 


speaking identically 


worded answers to typical school questions. 


Significantly higher grades were assigned by 62 experienced white 
teachers to the recorded answers when spoken by white students 
than when spoken by black students. Teachers who were most suscepti- 
ble to vocal stereotyping could not be differentiated on the basis of 
sex, age, years of teaching experience, most frequently taught grade 
level, or the percentage of black students most frequently taught. 


The importance of vocal cues as a basis 
for forming attitudinal judgments about a 
speaker has been dramatically demonstrated 
in a series of studies in which listeners who 
heard tape recordings of bilingual or bidia- 
lectal speakers formed significantly different 
attitudes toward the same speaker, depend- 
ing on what language or dialect the speaker 
was speaking (Anisfeld, Bogo, & Lambert, 
1962; Anisfeld & Lambert, 1964; Lambert, 
Anisfeld, & Yeni-Komshian, 1965; Lambert, 
Frankel, & Tucker, 1966; Lambert, Hodg- 
son, Gardner, & Fillenbaum, 1960). 

In studies involving listeners’ judgments 
about speakers, it has frequently been found 
that listeners tend to agree with one another 
with respect to the judgments they make, 
not only in cases where these judgments are 
accurate but also in cases where these judg- 
ments are inaccurate. (See Kramer, 1963; 
Sanford, 1942, for comprehensive reviews.) 
Agreement among listeners with respect to 
the errors they make in judging a speaker’s 
characteristics is probably the most foreeful 
evidence of vocal stereotyping. 

Williams, Whitehead, and Miller (1972) 
recently investigated the concept of vocal 
stereotyping within an educational context 


* Requests for reprints should be sent to Thomas 
K. Crowl, Professional Studies, Richmond College, 
City University of New York, 130 Stuyvesant 
Place, Staten Island, New York 10301. 


by examining relationships between teach- 
ers’ expectations of student academic per- 
formance and teachers’ attitudes toward 
students’ speech. Using students from three 
ethnic groups, the study found that students 
whose speech was judged to be more non- 
standard were also expected to perform 
worse academically than students whose 
speech was judged to be more standard. 

It is important to note that the study re- 
ferred to above focused on expectations of 
student performance. The present investiga- 
tion was designed to find out if students’ 
actual (not expected) academic performance 
is judged differently by teachers as a func- 
tion of students’ speech characteristics. An 
attempt was made to examine some im- 
portant practical ramifications of the phe- 
nomenon of vocal stereotyping. This was 
done by changing the level of conceptual 
focus from the listeners’ attitudinal judg- 
ments and expectations about a speaker to 
the level of overt behavior actually exhibited 
by listeners toward a speaker as a function 
of the speaker’s vocal characteristics. The 
present study was directed at two general 
questions: Do differences in students’ speech 
characteristics lead to differential judgments 
by teachers about students’ academic per- 
formance? If so, is it possible to identify 
specific characteristics of teachers who are 
most susceptible to vocal stereotyping? 
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METHOD 


boys were used in the study, one group consisting 
of six white students of an upper-middle: socioeco- 


had been gathered which demonstrated differential 
judgments on the basis of rather gross speech dif- 
ferences would it be feasible to try to isolate spe- 
cific, speech characteristics associated with differ- 
ential teacher judgments, 

In order to control the content of students’ an- 
swers while varying the Speech characteristics of 
the students, 12 predeterminéd answers were used 
for each of two questions: 


Why do we celebrate Thanksgiving? 
What is the difference between a discovery and 
an invention? 


The predetermined answers were based on answers 
actually given by another group of white ninth 
graders of upper-middle Socioeconomic level. The 
answers were worded in standard English, with the 
mean number of words being 18,3. Tape recordings 
were made of all students in the study speaking all 
answers. 

To insure that each student's ethnic group 


“could be accurately identified from his speech, 12 


judges drawn from the same population of teach- 
ers who participated in the study listened to the 
tapes and were asked to identify each student’s 
ethnic group. The overall accuracy of individual 
judges’ identifications was high, ranging from 75% 
to 100%, with a mean of 88.4%. 

Students were provided with a typed set of the 
predetermined answers and were given an unlim- 
ited amount of time to rehearse the answers and 
to ask the experimenter about the meaning or 
pronunciation of words or about the general proce- 
dures for taping. Answers were taped as often as 
necessary until both the experimenter and the 
student agreed that each written answer had been 
spoken verbatim and in a natural-sounding man- 
ner. 


Preparation of Tapes 


By splicing together segments from the original 
recordings, two tapes were compiled in such a way 
that on each tape (a) each of the 24 answers oc- 
curred once, (b) half of the answers to each ques- 
tion were given by black students and half by 
white students, (c) each student gave one answer 
to each question, and (d) for any given answer, 
the etlinie group of the respondent on one tape 
was reversed on the other tape. Answers to the 
Thanksgiving question preceded answers to the 
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"discovery" question on Tape A; on Tape B, this 
order was reversed. The beginning of each tape 
contained a set of instructions and four answers 
Spoken by the experimenter to one practice gues- 
tion. 3 


Subjects 


The subjects consisted of 62 white teachers who 
had one or more years’ teaching experience and 
who volunteered to participate. Overall, there were 


-more female than male subjects (n = 35 and 27, 


respectively); the modal agë was 25-34 years (n 
— 31); the modal number of years teaching ex- 
perience was 1-5 years (n — 27); the most frequent 
grade level taught was either elementary (n = 30) 
or senior high school (n = 25); the highest aca- 
demic degree held was either a bachelor's degree 
(n — 25) or a master's degree (n — 36); and 
classes most frequently taught by the subjects 
were comprised of 75925-10095 white Students (n — 
45). 


Experimental Procedure 


The study was presented in the guise of a 
project for gathering data concerning grading prac- 
tices of experienced teachers in order to establish 
norms for grading oral answers, Teachers were in- 
structed to grade each answer in terms of “how 
well it really answers the question,” using a scale 
where 10 = excellent and 1 = completely» wrong. 
Recordings were presented to subjects either indi- 
vidually or in small groups, with half of the sub- 
jects randomly assigned to listen to Tape A and 
half to Tape B. After grading the answers, each 
subject filled out an anonymous biographical date 
sheet. In terms of Sex, age, total years teaching 
experience, most frequent grade level taught, high- 
est academic degree held, and ethnic group of stu- 
dents most frequently taught, the subjects in both 
groups were remarkably similar, 


Resutts 


By summing the ratings given by the sub- hs 
jects across both tapes, it was possible to 
compare the mean rating for answers when 
spoken by black students with the mean 
rating for the same answers when spoken by 
white students. An analysis of variance with 
Tepeated measures was initially carried out, 
using the ratings of answers to each question 
separately. Since the results of the separate 
analyses were virtually identical, the data 
were combined and reanalyzed. In all analy- 
ses, the pooled interaction of individual sub- 
jects and individual students was used as 
the error term. The difference between sub- 
jects” ratings of the same answers spoken by 
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black students (M = 5.48, SD = 2.60) and 
by white students (M — 5.82, SD — 2.59) 
was significant (F — 13.22, df — 1/600, p « 
001). Thus, it was concluded that white 
teachers were influenced in their evaluations 
of students’ oral answers by the speech char- 
acteristics of students whose ethnic group 
could be identified from their speech. 

The finding of a significant difference be- 
tween the ratings assigned by the two groups 
of subjects (F = 21.74, df = 1/600, p < 
001) was unanticipated in view of the 
marked similarity of the two teacher groups 
in terms of biographical data and was par- 
ticularly surprising in view of the compara- 
bility of the two groups’ ratings of the prac- 
tice answers. Also unanticipated was the 
significant interaction between the ethnic 
group of student and of teacher group (F = 
14.19, df = 1/600, p < .001), indicating that 
subjects’ ratings of answers with respect to 
the students’ ethnic group differed, depend- 
ing on which tape had been listened to by 
the subject. Although the same students and 
the same answers appeared on both tapes, 
answers assigned to black students on Tape 
A were assigned to white students on Tape 
B, and vice versa. It had been assumed that 
some of the predetermined answers were in- 
herently better answers than others, but 
it had also been assumed that random as- 
signment would result in the answers on each 
tape being similar in quality for the two 
ethnic groups. In light of the above findings, 
the assumption of similar distribution of 

- . answer quality was tested. 

In order to examine the effect of answer 
quality without the effect of students’ voices, 
written versions of the answers used in the 

— study were presented to 42 student teachers. 
An analysis of the mean ratings of the 
answers in written form indieated that the 
answers assigned to white students on Tape 
A (M = 6.10, SD = 2.80) received signifi- 
cantly higher ratings (t = 4.42, p < .001) 
than did answers assigned to black students 
on Tape A (M = 5.62, SD = 2.81). Since the 
ethnic group of the student giving a particu- 
lar answer was reversed on the two tapes, 
the above finding also indicated that the 
answers assigned to black students on Tape 
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B received significantly higher ratings in 
their written form than did answers assigned 
to white students on the same tape. 

If the inherent quality of the answer was 
the only factor affecting subjects’ ratings of 
oral answers, one would expect white stu- 
dents’ answers to receive higher ratings on 
Tape A and black students’ answers to re- 
ceive higher ratings on Tape B. In fact, 
white students’ answers did receive higher 
ratings on Tape A (White, M = 6.22, SD = 
2.72; Black, M = 5.52, SD = 2.66). But 
black students’ answers did not receive 
higher ratings on Tape B (Black, M = 5.44, 
SD = 2.54; White, M = 5.42, SD = 2.40). 
In view of the evidence cited above, it ap- 
pears that inherently superior answers 
spoken by black students were not perceived 
as any better than inherently inferior an- 
swers spoken by white students; or con- 
versely, inherently inferior answers spoken 
by white students were perceived as being as 
good as inherently superior answers spoken 
by black students. 

Since differences in students’ speech char- 
acteristics led to significantly different judg- 
ments of the quality of students’ academic 
performance, an attempt was made to 
identify specific characteristics of those 
teacher subjects who were most susceptible 
to vocal stereotyping. No significant differ- 
ences were found, however, between ratings 
assigned to answers spoken by students of 
different ethnic groups by male or female 
subjects, by subjects younger than or older 
than 35 years of age, by subjects with less 
than or more than five years teaching expe- 
rience, by subjects who were elementary 
school teachers or nonelementary school 
teachers, or by subjects having taught 
classes comprised of less than or more than 
25% black students. 


Discussion 


In most previous studies of vocal stereo- 
typing, subjects have been explicitly in- 
structed to judge a speaker’s characteristics 
on the basis of vocal cues, and it has usually 
been tacitly assumed that the other kinds 
of behavior of a listener toward a speaker 
would vary as a function of the way in 


TEACHERS' EVALUATIONS OF SPEECH CHARACTERISTICS 


which the listener perceived the speaker. The 
current study did not make the task of judg- 
ing the speaker's characteristics explicit. On 
the contrary, subjects were specifically in- 
structed to evaluate the content of the 
speech and implicitly asked to ignore the 
characteristics of the speaker. A different 
kind of behavior—namely, evaluating an- 
Swers—was measured directly, and it was 
inferred that differences between evaluations 
of answers containing the same words were 
a function of the listener's perception of the 
speaker. The overall findings of the study 
supported the notion that the content of the 
same oral answer is evaluated differently 
when spoken by different persons whose 
difference in ethnic group is identifiable from 
their speech. 

The results of the current study are con- 
sistent with many previous findings regard- 
ing vocal stereotyping, In this study, listen- 
ers judged the content of answers spoken by 
black students as inferior to the content of 
answers spoken by white students. Such a 
judgment is, by definition, erroneous since 
the verbal content of black students’ and 
white students’ answers was the same. The 
results also support the findings from studies 
of bilingual and bidialectal speakers (Lam- 
bert et al., 1960, 1965, 1966), inasmuch as 
listeners of a majority ethnic group reacted 
unfavorably toward members of a minority 
ethnic group. Although no measures of sub- 
jects’ attitudes toward blacks in general 
were obtained, it is unlikely that negative 
attitudes would have been expressed. The 
subjects participating in the study, who were 
experienced teachers attending graduate 
courses during the summer, probably com- 
prised a group with rather liberal views. It 
shoul also be noted that subjects partici- 
pating in the study came from a total of 22 
different states, with only one subject from 
a southern state. 

Some differences in vocal cues between the 
two sets of answers may have been directly 
associated not with the different ethnie and 
socioeconomic backgrounds of students used 
in the study but rather with the experi- 
mental procedure itself. For example, the 
fact that students rehearsed their answers 
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might have added an automatic quality to 
the recordings, although anecdotal evidence 
from subjects indicated that teachers were 
not aware that the answers were not spoken 
spontaneously. Also, while the wording of 
answers in standard English was useful in 
eliminating rating biases associated with 
nonstandard grammar, it is possible that the 
recorded answers given by students who may 
characteristically speak nonstandard Eng- 
lish may have sounded incongruent. Again, 
however, anecdotal evidence suggested that 
this possibility was somewhat remote since 
most teachers indicated that during the 
experiment they were not even aware that 
the students came from different ethnic and 
socioeconomic backgrounds. 

Within an educational context, the find- 
ings of the study are consistent both with 
the recent finding that students with more 
nonstandard speech are expected to perform 
worse academically than students whose 
speech is judged as more standard (Wil- 
liams, et al., 1972) as well as with the much 
earlier finding of Michael and Crawford 
(1927) that ratings of students’ speech are 
positively correlated with students’ aca- 
demic performance. 

While the findings of the study have im- 
plications concerning the nature of teacher- 
student interaction in the classroom and 
should alert teachers to a possible source of 
bias that might go undetected, it must be re- 
membered that the experimental conditions 
of the study differed in a number of obvious 
and important ways from a typical class- 
room situation. In evaluating oral answers in 
a classroom, teachers have visual as well as 
vocal cues on which to base their judgments, 
students’ responses are spoken spontane- 
ously rather than after practice in saying 
them naturally, and the teacher often is al- 
ready well acquainted with the student who 
is giving an answer. Also, controlling the 
content of answers may have diminished 
differences that might actually be found 
among students such as those used in the 
study, because vocabulary and grammatical 
differences were purposely eliminated. It is 
difficult to know how the combination of 
cues transmitted from the student to the 
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teacher might detract from an unbiased ap- 
praisal of a student's academic performance. 

An important way in which the experi- 
mental conditions probably did correspond 
to a classroom situation is the fact that the 
teachers' attention was not drawn specific- 
ally to the speech characteristics of stu- 
dents. Despite the fact that students' speech 
differences were not highlighted, vocal stere- 
otyping occurred. An immediate practical 
concern emanating from the finding of vocal 
stereotyping are the grades actually assigned 
to students and the impact these grades may 
have on the student’s perception of himself 
and his subsequent attitudes toward school 
and academic performance. Although sta- 
tistically significant, the magnitude of the 
difference between ratings assigned to speak- 
ers of different ethnic groups was only .34 
on a 10-point scale. Nevertheless, if teachers 
engage in vocal stereotyping without being 
aware that they are stereotyping, the poten- 
tial cumulative influence of a student's 
speech on the teacher’s judgment of the stu- 
dent could be considerable, particularly 
when one considers the frequency of stu- 
dent-teacher verbal interaction in the class- 
room. Even though teachers may not be re- 
cording marks in a grade book every time a 
student speaks, it may be that the teacher 
makes some kind of judgment about what 
kind of person the student is, and these 
subtle judgments may ultimately affect the 
teacher's behavior toward a student. 

It should be noted that it was not possible 
to identify specific characteristics of teach- 
ers who were most susceptible to vocal stere- 
otyping, but all of the teachers in the study 
were white and very few of them had had 
much experience teaching black students. It 
should also be noted that the lower scores 
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assigned to black students’ answers may re- 

flect, at least in part, differences in the per- 

ceptual ability of white teachers to appre- 
hend black students’ speech. The result of 

not being able to apprehend as easily what a 

student is saying might lead to the assign- 

ment of a lower score. Certainly, it would 
seem fruitful to carry out similar kinds of 
investigations with different populations, 
such as black teachers or teachers from vari- 
ous geographic regions or dialect areas. 
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MALE AND FEMALE TEACHER ATTITUDES AS A FUNCTION 
OF STUDENTS’ ASCRIBED MOTIVATION AND 
PERFORMANCE LEVELS! 


LARRY J. BRANDT ax» MARY ELLEN HAYDEN? 


University of Houston 


The attitudes of male and female college students were compared after 
having taught a successful or unsuccessful simulated student who was 
labeled as either an overachiever or underachiever. The data showed 
the following: performance of the child was found to be the pre- 
dominant factor in determining the teachers’ attitudes; ascriptions 
modified the performance effect; and some differences in attitudes 
were found between the male and female subjects. Implications of the 


data were discussed. 


The manner in which teachers form their 
opinions of students and the effect these 
opinions have on the students have become 
important issues in educational research. 
Rosenthal and Jacobson (1968) initiated 
much of the recent work in this area with 
their book Pygmalion in the Classroom. 
They found that randomly labeling some 
children "late bloomers" at the beginning of 
the school year had a significant effect on 
those children’s IQ scores at the end of the 
school year. The authors assumed that the 
increased scores reflected differential ex- 
pectations and behaviors on the part of the 
teachers. 

Subsequent experiments have yielded 
mixed results. Baker and Crist (1971) sum- 
marized nine unsuccessful attempts to repli- 
cate the Pygmalion results. However, sev- 
eral other studies have shown that teachers’ 
expectations did indeed affect their teaching 
performance. Beez (1968) found teachers 
taught more to students randomly labeled 
“high ability” than to those labeled “low 
ability.” Medinnis and Unruh (1971) dis- 
covered a higher positive/negative rein- 
forcement ratio from teachers for students 
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randomly rated as “high ability” than for 
those rated as “low ability.”. Rothbart, 
Dalton, and Barrett (1971) found that 
teachers were more attentive toward chil- 
dren randomly labeled “bright” than toward 
those labeled “dull.” They also rated the 
“bright” students as being more intelligent, 
having greater potential for future success, 
and having less need for approval. Brophy 


and Good (1972) have suggested two reasons ` 


for these discrepant results. First, results 
inconsistent with those of Rosenthal and 
Jacobson (1968) generally spanned a long 
period of time. Second, teachers were often 
familiar with the students before ascriptions 
were made about them. Brophy and Good 
suggested that in this type of situation, in- 
formation from an experimenter is not a 
very strong research manipulation; the 
teachers may simply disbelieve or otherwise 
reject the experimenter’s information. 

Furthermore, the dependent variable in 
the Rosenthal and Jacobson (1968) study 
and in the nine attempted replications re- 
viewed by Baker and Crist (1971) was IQ 
scores. In most of the related studies in- 
vestigating teacher expectancies, the de- 
pendent variables were teacher and/or stu- 
dent behaviors. 

Both the successful and the unsuccessful 
studies referred to above used a paradigm 
that involved (a) making differential as- 
criptions about students assigned randomly 
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from the same population, (b) giving the 
teacher the opportunity to interact with and 
to influence the students, and (c) assessing 
differential teacher behavior and/or the re- 
- sultant student behavior. The present study 
involved a different paradigm in that dif- 
ferential ascriptions were made as in the 
Rosenthal and Jacobson (1968) model, but 
by simulating fictitious student responses, 
the teacher had no opportunity to affect the 
behavior of the child. The teacher’s atti- 
tudes toward the child and toward herself 
were assessed at the end of the teaching 
situation. 

It was expected that the second model 
would not only confirm previous findings 
that ascriptions affect teacher attitudes but 
would also show what happens when a 
teacher has an expectation about the stu- 
dent which is either confirmed or countered 
by his behavior. Of particular interest was 
whether the resultant attitude of a teacher 
is more influenced by the teacher’s initial 
perception of the student or by the student’s 
actual performance. 

Other than the results recently reported 
by Good, Sikes, and Brophy (1973) which 
found that male and female teachers be- 
haved differently in certain teacher-student 
interactions, the question of differential be- 
haviors and attitudes of male and female 
teachers has not been studied. Thus, a sec- 
ond purpose of this study was to determine 
whether male and female subjects form 
different attitudes in a teaching situation. 


METHOD 


Subjects 


Forty-eight subjects, half male and half fe- 
male, were chosen from introductory psychology 
courses at a large university. Each volunteer was 
between 18 and 24 years of age and had no chil- 
dren in his immediate family under the age of 
nine. 


Overview of Task 


Each subject was informed that he would teach 
a six-year-old male child a discrimination learn- 
ing task but that because various characteristics in 
addition to a child's performance (e.g, his ap- 
pearance) may affect teacher-child interactions, 
he would neither see, hear, nor talk with the child. 
On a display panel, the subject saw which stimulus 
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was presented to the child and which response was 
made by the child. Subsequently, the subject was 
free to administer feedback, additional informa- 
tion, and reinforcement by pressing specific panels 
of the apparatus. He then pressed another panel 
which initiated the next trial. Supposedly, an as- 
sistant was in the room with the child for the 
purpose of conveying information to him from 
the subject. In reality, there was neither an as- 
sistant nor a child present; programmed sequences 
of correct and incorrect responses simulated the 
performance of a successful or unsuccessful child. 
The responses of the adult, along with four rating 
scales completed by the subject upon termination 
of the teaching task, comprised the dependent 
variables. Only the rating scale data are reported 
in this paper. 

A complete description of the experimental 
apparatus is described by Brandt (1971). 


Procedure 


The subjects were assigned randomly to the 
eight treatment combinations of a 2 X 2 X 2 
factorial design with the restriction that there were 
the same number of education and noneducation 
majors in each group of female subjects and that 
all male subjects were noneducation majors. The 
independent variables were (a) performance of the 
child: successful-unsuccessful ; (b) information 
given to the subject about the motivation of the 
child: overachiever-underachiever ; and (c) sex 
of the subject: female-male. 

Each subject was seated at a table in the ex- 
perimental room and was asked to read a set of 
written instructions (see Brandt, 1971). Included 
in the instructions was one of the following state- 
ments regarding the fictitious student: “His school 
teacher believes that he is an overachiever since 
he works very hard [B:];” or “His school teacher 
believes that he is an underachiever since he 
doesn't work very hard [B;]." The subject was in- 
formed that the experiment would be terminated 
after a given number of trials regardless of the 
child's progress, and he was instructed to begin the 
experiment by pressing the “new trial" panel. 

Seven blocks of eight trials each were then ad- 
ministered. The fictitious child in the successful 
condition responded correctly two, three, four, 
five, six, seven, and seven times, respectively, for 
each of the seven trial blocks. The unsuccessful 
child responded correctly two, two, two, three, 
three, three, and three times, respectively. 

After completion of the 56 trials, the subject 
was asked to complete four evaluation statements 
by selecting one of the following words or phrases: 
(a) very well, (b) well, (c) neutrally, (d) poorly, - 
or (e) very poorly. The evaluation statements 
were as follows: 


1. The child I worked with 


performed A 
2. The child I worked with 
was. motivated. 


ASCRIBED ATTITUDES, MOTIVATION, AND PERFORMANCE 


TABLE 1 


SIGNIFICANT SOURCES OF VARIANCE FOR THE FOUR 
DEPENDENT MEASURES 


Statement A B BC 


1. Rating of child's 
performance 

2. Rating of child's 
motivation 
3. Subject's rating| 

7 of enjoyment of 
task 

4. Subject’s rating 
of own perform- 
ance 


<.01 


<.01 


<.01 


Note. All values in the table indicate p level. 
The following were not included in the table be- 
cause they did not reach significance: C, AB, and 
AC. Abbreviations: A = performance, B = 
ascribed motivation, and C = sex of subject. 


3.1 liked working with this 
child 
4.I feel that as a teacher 
I performed —. 


Upon termination of the study, 86% of the 
subjects reported that they believed a child was 
present and that they were teaching him. The re- 
maining 14% stated that although they were un- 
sure of the existence of the child, they had per- 
formed as though a child in fact were present. 
Analyses revealed no significant differences be- 
tween these groups; as a result, data from the un- 
sure group were included in the final analyses. 


RESULTS 


The data of the female education majors 
did not differ from the data of the female 
noneducation majors. Therefore, these data 
were pooled for purposes of statistical anal- 
yses. 

An analysis of variance of the three in- 
dependent variables was performed on each 
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of the four response measures (see Table 1). 
Teachers’ attitudes were determined by the 
level of performance of the fictitious child; 
the strength of this phenomenon is demon- 
strated by the fact that all four measures 
yielded similar results. 

Ascribing a motivation level to the child 
had a significant effect on the teachers’ 
ratings of child performance but not on 
the other three measures. Inspection of 
Table 2, which presents cell means for each 
of the four dependent measures, reveals that 
subjects judged the child's performance 
most favorably when the two sets of in- 
formation (i.e., successful level of perform- 
ance and overachiever label) were comple- 
mentary and reflected favorably on the 
child. They judged his performance least 
favorably when the two sets of information 
were complementary and reflected unfavor- 
ably on the child. However, when the 
sources of information were conflicting, such 
that one was favorable and the other un- 
favorable, subjects rated the child's per- 
formance at intermediate values, indicating 
the importance of the motivation ascription 
in determining the teachers’ judgments. 
Similar trends, although not statistically 
significant, occurred in the other three re- 
sponse measures. Thus, in conflict situa- 
tions, the ratings indicated that the actual 
performance of the child carried more 
weight than the motivation label ascribed to 
the child. 

Figure 1 reveals the differences in mean 
ratings on the child’s performance scale as 
a function of sex of the teacher. In this 
triple interaction, both male and female 
subjects rated successful children more 
favorably than unsuccessful ones, regard- 


TABLE 2 


A,B; CELL Means AND STANDARD Deviations ror EACH 
or THE Four EVALUATION MEASURES 


T 2 3 4 
Cell 
x SD x SD x SD x SD 
A;Bi 1.50 .50 1.83 .50 1.50 .50 2.42 .39 
AB 2.33 -71 2.25 -71 1.59 uL 2.51 .92 
AjBi 3.42 .63 2.92 .76 2.75 .84 3.50 .50 
AB: 3.75 .59 3.08 1.11 2.50 .85 3.58 1.10 
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MEAN RATING OF CHILD'S PERFORMANCE 
€ ow 
HE 


"s 


[EKT N 3 


SUCCESSFUL UNSUCCESSFUL SUCCESSFUL UNSUCCESSFUL 

FEMALE MALE 
FiaunE 1. Subjects’ mean rating of child's per- 
formance for levels of apparent motivation at each 
level of successfulness for female and male subjects. 


1,50) 


B as 


less of the child's ascribed motivation level. 
However, male and female subjects differed 
in that females rated successful over- 
achievers most favorably, unsuccessful 
overachievers least favorably, and under- 
achievers at intermediate values, while 
males rated both successful overachievers 
and successful underachievers favorably 
and unsuccessful underachievers least favor- 
ably, 

In evaluating the child's motivational 
level, the respective ratings of male and fe- 
male subjects were surprisingly similar to 
their ratings of the child's performance. It 
can again be seen in the triple interaction 
Shown in Figure 2 that while successful 
children were rated more favorably than 
unsuccessful children by both male and fe- 
male subjects, female subjects again rated 
overachievers at the extreme ends of the 
scale, whereas male subjects again rated 
both successful overachievers and success- 
ful underachievers most favorably and un- 
successful underachievers most unfavorably, 

The analysis of variance also indicated a 
significant Sex of Subject X Motivation 
Ascription interaction on the subject’s rat- 
ing of his enjoyment of the task. Females 
rated enjoyment of the task more favorably 
when working with overachievers than with 
underachievers. Males enjoyed working with 
underachievers more than with overachiev- 
ers. For females, these data are consistent 
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with those presented in Figures 1 and 2 in ^ 
that females gave more positive ratings to 
overachievers than to underachievers, Un- 
like females, males rated overachievers 
more favorably on the performance and 
motivation scales, as indicated in Figures 1 
and 2, but less favorably on the enjoyment 
scale. It should be considered that in ana- 
lyzing four response measures, some effects 
judged significant could very likely be, 
spurious. 


Discussion 


This study investigated the effect of as- 
cribed motivation labels and actual per- 
formance of fictitious children on teachers’ 
perception of the learning situation and par- 
ticularly the interaction of these two varia- 
bles on teachers’ attitudes. A second pur- 
pose was to investigate whether differences 
in attitudes existed as a function of the 
subject’s sex. 


Ascribed Motivation and Actual 
Performance of Simulated Child 


Performance of the child was found to be 
the predominant factor in determining the 
teachers’ attitudes, but ascriptions modified 
the performance effect. This demonstration 
of the influence of ascriptions on teachers’ 
evaluations is consistent with the results of 
Medinnis and Unruh (1971) and Rothbart, 
Dalton, and Barrett ( 1971), who found that 
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Ficure 2. Subjects’ mean rating of child's moti- 
vation for levels of apparent motivation at each 
level of successfulness for female and male sub- 
jects. 
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teacher performance varied as a function 
of student ascriptions. However, the present 
study differs from the latter ones in that 
personal interaction between teacher and 
child was prevented so that cues such as 
physical appearance, racial and socioeco- 
nomic status, and behavior characteristics 
were eliminated. This method also precluded 
any self-fulfilling-prophecy effect because 
the teacher had no influence whatsoever on 
student behavior, which was simulated by 
the experimenter. The procedure in the pres- 
ent study allowed an assessment of whether 
the teacher's attitudes were more influenced 
by her expectancy or by the student's per- 
formance, unconfounded by any behavioral 
interaction. 

Predominance of student performance as 
a determinant of teacher attitudes suggests 
performance could eventually modify or 
overcome ascriptions if teachers were not a 
causative factor in child behavior. Unfor- 
tunately, ascriptions do cause differential 
teacher behavior toward children such that 
the child’s performance is modified and, 
thus, does not tend to counteract the ascrip- 
tion. In such cases, a self-fulfilling-proph- 
ecy situation occurs. Caution should, there- 
fore, be exercised in attaching labels such 
as “mentally retarded,” “slow,” or “dull” 
to a child. 

The mixed results obtained in attempts 
to replicate the Pygmalion study reviewed 
by Brophy and Good (1972) might be par- 
tially explained by the results of the pres- 
ent study. Some teachers may be more effi- 
cient in controlling their behavior toward 
students in such a way as to minimize or 
eliminate the influence of ascriptions. These 
teachers would then have unconfounded 
performance to evaluate and thus would be 
more likely to discover the true performance 
level of the child. Such a teacher could de- 
crease or prevent an ascription effect. Fu- 
ture studies to evaluate individual teacher 
efficiency in this area appear warranted. 


Sex of Subject 


Some differences in attitudes were found 
between male and female subjects. Although 
both males and females rated successful 
children more favorably than unsuccessful 


D sail 
313 


ones, females differentiated most in their 
ratings between successful overachievers 
and unsuccessful overachievers, whereas 
males differentiated most in their ratings 
between successful overachievers and unsuc- 
cessful underachievers. There were also sex 
differences in the types of students the sub- 
jects preferred to teach. Males generally en- 
joyed teaching underachievers, while fe- 
males generally preferred teaching over- 
achievers. 

These findings of sex differences are not 
surprising in that many similar cases are 
cited in the literature in which sex of the 
participants is an important variable in 
professional-client relationships, such as 
those between teacher and student (Cosper, 
1970), counselor and client (Gamsky & Far- 
well, 1968; Hebert, 1968), or experimenter 
and subject (Rosenthal et al., 1964). How- 
ever, the present study suggests that some- 
thing more basic than sex role interactions 
are responsible for these variations. It ap- 
pears likely that basic differences between 
male and female preferences or attitudes ex- 
ist prior to the teacher-student interaction 
and that these differences influence teacher- 
student interactions. Good, Sikes, and Bro- 
phy (1973) found differences in the behav- 
iors of male and female teachers while 
interacting with their students. These atti- 
tudinal and behavioral findings, after hav- 
ing become reliably established, should have 
important implications for educational pro- 
grams. 
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+ Early research on retrieval of semantic information has provided a rea- 
sonably accurate description of retrieval of certain kinds of well- 
learned material. In the present study, 30 graduate students were 
asked to produce a type of semantic information; they named 
psychologists who satisfied certain restrictions, Not only was the speed 
of naming a psychologist influenced by the order in which restrictions 
were given, but the effect of order differed for advanced and 
beginning students. Advanced-student retrieval resembled the pattern 
observed for well-learned semantic material, while beginning-student 
retrieval did not. Retrieval was, thus, subtly related to how much in- 
struction a student had completed. These data have implications for 
the use of reaction time to assess progress in the acquisition of new 


material. 


One of the most fundamental problems 
confronting today’s cognitive psychologist is 
how to (a) represent the knowledge that a 
person has and (b) determine the mecha- 
nisms by which a person uses this knowl- 
edge. This article addresses the question of 
how knowledge structures and retrieval 
mechanisms change during the course of 
learning new material. 

In several recent experiments (Freedman 
& Loftus, 1971; Loftus, in press; Loftus & 
Suppes, 1972), a subject was shown a stimu- 
lus consisting of a noun category paired with 
either a letter or an adjective, and his job 
was to provide a word that satisfied these 
imposed restrictions. For example, & subject 
who was presented with the pair anzmal-m 
might have said mouse, moose, or monkey, 
among other possibilities. A correct response 
would have been any word beginning with 
m that named a kind of animal. For fruit- 
yellow, a correct response would have been 
any fruit to which the adjective yellow was 
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applicable (e.g., banana, lemon, etc.). A 
model of semantic memory that accounts for 
the reaction time data in these experiments 
assumes that the memory store consists of a 
large number of interconnected and cross- 
referenced associative and category net- 
works. According to the model, memory is 
organized into a complex network composed 
of categories (e.g., animals) with subsets of 
each (e.g., birds, dogs) and supersets (e.g., 
living things). Within each category a 
variety of subsets exist: Some of them are 
clusters of items that are highly associated 
because they have qualities in common (e.g., 
small animals). Retrieval from this hier- 
archical structure is assumed to consist of at 
least two major steps: (a) entering the ap- 
propriate category and (b) finding an ap- 
propriate member of that category. 

In a newer version of this paradigm, the 
stimuli have become more complex; for ex- 
ample, in one experiment, subjects were pre- 
sented with stimuli consisting of a noun 
category plus both an adjective and a letter 
(e.g., animal-small-m) and had to produce 
a member of the category that satisfied the 
two restrictions. That is to say, the response 
had to be a member of the category that 
began with the given letter and to which the 
adjective was applicable (e.g., mouse). Sub- 
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jects were given the category first, but the 
order in which the adjective and letter 
restrietors were presented was varied. On 
half of the trials, subjects saw the adjective 
one-half second before the letter (e.g., ani- 
mal-small-m) , while on the remaining trials 
they saw the letter first (e.g., animal-m- 
small). Reaction time was measured from 
the onset of the last restrictor. The results 
indicated that a large advantage in reaction 
time exists when the adjective is presented 
before the letter. In other words, subjects are 
considerably faster at naming, for example, 
an animal-small-m than at naming an ani- 
mal-m-small. A discussion of this finding in 
terms of the network model of semantic 
memory is as follows. When a subject must 
produce a category member that satisfies 
both an adjective and a letter restrictor 
(e.g, animal-small-m), he apparently first 
enters the category (animals), then restricts 
himself to the adjective-defined subclass 
(small animals), and finally he searches 
there for an item whose name begins with 
the particular letter requested (m). Thus, 
when the adjective is presented before the 
letter, the subject can begin the second step 
earlier, 

Taken together, these experiments give 
us a reasonably good picture of retrieval 
from extremely well-learned categories. A 
question of interest is: What happens when 
categories are not so well learned or are in 
the process of being learned? One way to 
study the retrieval of information that is in 
the process of being learned is to find a 
situation in which natural learning of cate- 
gories is taking place. Such a situation exists 
in all graduate schools of psychology where 
graduate students are learning, among other 
things, the names of psychologists. At an 
institution where one of the authors was 
teaching, students learn that there are 
roughly six areas of psychology (learning, 
perception, memory, personality, social, and 
developmental) and that various psychol- 
ogists may be associated with one or more 
of these areas. Learning to associate psy- 
chologists with particular areas of research 
is tantamount to learning to categorize psy- 
chologists with respect to these areas. Dif- 
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ferent degrees of learning should be evident ' 
in people who are in different stages of 
graduate school. 

For the moment let us assume that the 
number of credits a graduate student has 
completed is a rough index of the amount of 
psychology he knows, or the extent to which 
he has organized psychologists in semantic 
memory. The major question to which the 
present research is directed is: Does this ob- 
jective measure of learning about psychol- 
ogy (number of credits) correlate with the 
extent to which retrieval (of psychologists) 
mirrors retrieval of well-learned informa- 
tion such as animals, fruits, etc. 

How do we know when retrieval of psy- 
chologists mirrors retrieval of well-learned 
information? The present experiment 
allowed such a test. Graduate students were 
asked to produce the names of psychologists. 
On any given trial, the psychologist named 
had to satisfy two restrictions: Both (a) 
an area of psychology and (b) a letter were 
shown (e.g., learning-b), and the subject 
had to produce the name of a psychologist 
that began with the given letter and who 
was associated in the subject’s mind with 
the given area. For example, a subject who 
was presented with the stimulus learning-b 
might say Bower, Bourne, or Blodgett, 
among other possibilities. On half of the 
trials, subjects saw the area first (e.g., learn- 
ing-b), while on the remaining trials they 
saw the letter first (e.g., b-learning). Reac- 
tion time was taken from the onset of the 
last restrictor. As may be clear without 
further mention, this area-plus-letter ex- 
periment was extremely similar to the adjec- 
tive-plus-letter experiment in which sub- 
jects had to name, for example, an animal 
that was small beginning with m. In the 
latter experiment, reaction time was much 
faster when the adjective preceded the 
letter. If memory for psychologists is as well 
organized, one might expect the same ad- 
vantage in reaction time to obtain when the 
area precedes the letter. If memory for psy- 
chologists is not so well organized, no such 
advantage in reaction time would be ex- 
pected. 


CHANGES IN MEMORY STRUCTURE 


METHOD 


Subjects 


The subjects were 24 students at the New 
School for Social Research, New York, New York. 
Each subject took part in one experimental 
session that lasted about 30 minutes. 


Materials 


Six areas of psychology were selected: learning, 
memory, perception, social, developmental, and 
personality. Each area was paired with eight dif- 
ferent letters, creating 48 unique stimuli. Each 
stimulus was presented with the area shown first 
or second (eg. learning-b, b-learning) and with 
an interval of .5 second between the area and letter. 
Stimuli were printed on 5 X 8 inch cards. 

Each subject received a random permutation of 
48 stimuli with the following restrictions: (a) 
A given stimulus (such as learning-b) occurred 
equally often in the area-letter and letter-area 
conditions; (b) half of the stimuli presented to 
any one subject were in the area-letter order, 
while the other half were in the opposite order. 


Procedure 


Each subject was told that the study concerned 
memory for psychologists and that he was to pro- 
duce the name of one psychologist on each trial. 
He was told that he would see items consisting of 
an area of psychology and a letter and that he 
Should respond with the name of a psychologist 
that began with the given letter and who was 
associated in his mind with the given area. He was 
given examples and told to respond as quickly as 
possible but to avoid errors. 

The subject sat in front of a screen in which 
was & window covered by half-silvered glass. The 
index card containing the stimulus was placed in 
a dark enclosure behind the mirror and was 
presented by illuminating the enclosure. A micro- 
phone was placed in front of the subject, and he 
responded by speaking into it. 

A trial consisted of the following. As a card with 
the item printed in large type was placed in the 
darkened enclosure behind the half-silvered mirror, 
the experimenter said, “Ready” and pressed a 
button that illuminated the first half of the 
stimulus. After a .5-second interval, the second 
half of the stimulus was automatically illuminated, 
and simultaneously an electric timer with a de 
clutch was started. The subject’s verbal response 
activated a voice key that stopped the clock and 
. terminated the trial. A warm-up period of 15 
” trials preceded the experimental trials. 


RESULTS 


Only correct responses (56%) to each of 
the 48 stimuli are included in the following 
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Ficure 1. Reaction time as a function of order 
of presentation of area and letter for beginning and 
advanced students. 


analyses. The most important and interest- 
ing result obtained when we separated the 
subjects according to the number of gradu- 
ate course credits they had completed. We 
operationally defined "beginning students" 
as those who had completed fewer than 40 
credits (mean number of credits for these 
12 subjects was 27) and "advanced stu- 
dents" as those who had completed more 
than 45 credits (mean number of credits for 
these 12 subjects was 54). Two median 
latencies were obtained for each subject’s 
responses in the two conditions (area-letter 
and letter-area). For these two conditions, 
group mean latencies were obtained by 
averaging the medians separately for the 
advanced and the beginning students. The 
results of this analysis are shown in Figure 
1. A two-way analysis of variance was per- 
formed on the median' reaction times in 
terms of (a) order of presentation of the 
area and letter, and (b) type of student. 
Concerning the main effects: Order of pres- 
entation was not significant (F < 1), but 
advanced students responded more quickly 
than beginning students (F — 5.60, df — 
1/22, p < .05). The interaction between 
these factors was highly significant (F = 


CO" 
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33.63, df = 1/23, p < .001), indicating that 
advanced students were faster when the 
area was presented first rather than second, 
while the beginning students favored the 
condition in which the letter occurred first. 


Discussion 


The retrieval pattern for advanced stu- 
dents is clearly different from the pattern 
for beginning students; advanced students 
responded more quickly when the area was 
presented first, while the beginners favored 
a letter-area presentation. This finding 


` makes a great deal of sense when you stop 


and think about what a student of psychol- 
ogy knows. The advanced student is aware 
that the category of psychologists is sub- 
divided into areas such as learning and per- 
ception, just as his category of animals is 
subdivided. When given the area before the 
letter, this student can restrict himself to 
the area-defined subclass and then search 
for a psychologist whose name begins with 
the particular letter requested. The begin- 
ning student, however, does not have psy- 
chology so well organized; the field is not yet 
neatly subdivided. This student knows a few 
important names; he probably knows Freud, 
Skinner, Piaget, and possibly some others. 
When the letter is given before the area, the 
beginning student probably begins scanning 
his list for a name beginning with the par- 
ticular letter requested and then produces 
that name almost irrespective of which area 
is presented. If the student stretches it, 


. Piaget and Skinner could both fit into quite 


a few different areas of psychology. 
Regardless of the exact storage or the 
exact retrieval mechanisms that these two 
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types of students are using, it is clear that 
the retrieval patterns observed in the pre- | 
sent study are related to how much instruc- 
tion a person has completed. Furthermore, 
the retrieval pattern for the advanced stu- 
dents resembles the pattern observed for 
well-learned material, that is, the more ex- 
perience a student has had with the field of 
psychology, the more his retrieval of this in- 
formation seems to mirror the retrieval of 
material that we know is well organized and 
learned. The implication here is that one of 
the consequences of instruction may be to 
change a student’s retrieval pattern, such 
that it is more efficient, resembling the 
retrieval of well-learned material. It ap- 
pears that we can use reaction time to assess 
the real impact of instruction in much more 
subtle ways than we now do. Instruction 
does more than teach content. In addition, 
as a person learns new material, his cogni- 
tive structure is organized and modified in 
some way. Reaction time measures such as 
the ones used in this study can give in- 
formation about progress being made and 
ultimately about the process of acquiring 
new material. 
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disadvantaged populations. 


Of late, several books and articles have 
appeared which argue that “schools don't 
make a difference.” These statements have 
been based on the Coleman Report (Cole- 
man, et al., 1966) and other investigations 
which shared several common characteris- 
ties that precluded drawing any such infer- 
ences. First, they have used schools rather 
than teachers as the unit of analysis. 
4 Schools are not appropriate units for analy- 

sis because they are staffed by teachers of 

varying ability, and lumping together the 
data from these individual teachers masks 
rather than reveals the effects of the quality 
of schooling. Only data based on the 
teacher as the unit of analysis can show 
that some teachers are better than others. 

Second, to the extent that teaching staffs do 
77" This article is an expanded version of a paper 
presented at the annual meeting of the American 
Educational Research Association, New Orleans, 
February 1973. The research described was sup- 
ported by National Institute of Education Con- 
tract OE 6-10-108, The Research and Development 
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tion or policy of the National Institute of Edu- 
cation, and no official endorsement by that office 
== should be inferred. For their assistance in prepar- 

ing the data and manuscript, the authors wish to 
thank Marilyn Arnold, Carolyn Evertson, Susan 
Florence, Kathy Paredes, Kathleen Senior, Jane 


Sheffield, and John Sheffield. 
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Several recent books and articles have concluded that only the qual- 
ity of the student body, and not the quality of the school or its teach- 
ing staff, “makes a difference" on measures of student learning, These 
studies, however, have measured only presage variables and have 
used schools rather than teachers as the unit of analysis. The present 
study, using a sample of 115 second- and third-grade teachers with 
five or more consecutive years of experience teaching at their respec- 
tive grade levels, showed that teachers do affect student learning to 
a degree that is both statistically and practically significant. Teacher 
effects were especially robust in the data from Title I schools serving 


differ in quality, schools serving advan- 
taged groups are likely to have better staffs 
than schools serving disadvantaged groups 
(Mood, 1970). This factor accentuated the 
probability of finding that “schools don’t 
make a difference,” since better students get 
better teaching, thus increasing the achieve- 
ment gap between such students and less 
gifted students. Third, and most important, 
one cannot draw conclusions about school- 
ing without measuring it, but this is being 
done nevertheless. Coleman and his col- 
leagues measured presage variables like 
years of teaching experience and highest de- 
gree obtained, but they did not obtain either 
process data on teachers’ classroom perfor- 
mance or product data showing the student 
learning gains that individual teachers pro- 
duced. 

The latter technique, employed in this 
study, is best suited for demonstrating that 
teachers do in fact make a difference (or, 
more specifically, that teachers differ in 
their relative impact on student learning). 
The usefulness of such data, although seem- 
ingly obvious on a commonsense basis, has 
not been recognized or stressed until re- 
cently (Mood, 1970). Furthermore, its va- 
lidity was seriously challenged by Rosen- 
shine’s (1970) review of stability in teacher 
effectiveness. Rosenshine could locate only 
five studies containing information on 
teacher stability over long periods (one 
semester or more). Of these, one involved 
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recruits in an armed service training school 
and two others came from a study in which 
a new curriculum was being introduced; 
therefore, the teachers were not teaching in 
their accustomed ways. Thus, only two 
studies reflected teaching by typical teach- 
ers working under normal conditions. One 
study gave no stability coefficient but stated 
that stability was very low, while the other 
reported a coefficient of .09. 

These data seemed to support the idea 
that teachers are not stable or consistent in 
the relative student learning gains they 
produce and that teacher effectiveness (by 
this definition, at least) is not a "trait" or 
stable quality. However, the data from 
these studies were from unselected teacher 
samples that may have included substantial 
proportions of new teachers and/or teachers 
recently shifted into a new grade. Such 
teachers are known to be unstable in their 
teaching behavior (appropriately so, since 
they are meeting and adjusting to new de- 
mands); therefore, they are unlikely to 
show much stability. 

The present study investigated teacher 
Stability in producing student learning 
gains in a sample of teachers with five or 
more years of experience teaching at the 
same grade level. Although few would 
argue that the amount of gain pupils show 
on standard achievement tests is the only 
or even the best measure of teaching effec- 
tiveness, it is being used increasingly for 
this purpose. The present paper concerns 
methodological considerations involved in 
obtaining unbiased estimates of teacher in- 
fluence on pupil achievement and provides 
data related to the substantive question of 
whether or not (or how much) teachers 
“make a difference.” 


SAMPLES AND MEASURES 


All second- and third-grade teachers 
(N = about 275) in a large Southwestern 
urban school system were considered for 
inclusion in a comprehensive investigation 
of teacher effectiveness, classroom behavior, 
and personal characteristics. All teachers 
were female. Teachers selected from the full 
sample for inclusion in the present study 
were those who (a) had at least five years 
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of teaching experience at their grade, (b) 
had taught the same grade level during the 
three focal years (1967-1969), and (c) had 
at least 14 children with available data for 
each of these years (data on 20-30 pupils 
were available for most classes). The teach- 
ers represented 15 Title I (poverty area) 
and 35 non-Title I schools. The four sam- 
ples resulting from this selection were as fol- 
lows: 


21 Grade 2, Title I teachers (1,210 pupils) ; 

35 Grade 2, non-Title I teachers (2,168 
pupils) ; 

20 Grade 3, Title I teachers (1,216 pupils) ; 
and 

39 Grade 3, non-Title I teachers (2,744 
pupils). 


Pupil records were retrieved from school 
files for each of four successive years of 
regular fall achievement testing. Grade 
equivalent scores were obtained for the 
Metropolitan Achievement Test (MAT) 
subscales. Different forms of the MAT 
battery were used with each of the four 
samples, necessitating separate statistical 
analyses. 


INFLUENCES ON PREDICTIVE EFFICIENCY 


It is now generally accepted that residual 
gain scores are superior to simple pretest- 
posttest difference scores as measures of 
teacher influence. What is not clear, how- 
ever, is the importance of residualizing with 
more than the simple pretest variable 
(Cronbach & Furby, 1970). The following 


series of analyses were designed to explore , 


this problem. 

A series of regression models were com- 
pared, using (a) pretest, (b) squared pre- 
test, (c) pupil sex, (d) year of testing, and 
(e) teacher? as predictors of posttest per- 
formance. In each comparison, one of these 
influences was omitted to determine its 
contribution to prediction of the criterion. 


Tables 1-4 contain the results of these com- - 


parisons, expressed as percentages of crite- 
Tion variance associated with each influence. 


*Teachers were represented in the equations by 
a set of binary variables, one per teacher. The en- 
tire set was omitted to estimate teacher influence. 
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; TABLE 1 : 
INFLUENCES ON PREDICTION or STUDENT LEARNING GAIN MEASURES IN THE GRADE 2, 
TITLE I SAMPLE 
M Influence (75) 
MAT subtest 

Pretest Posttest Pretest Ad Sex Year Teacher 
Word Knowledge 1.74 2.41 32.65 .14 .23 .20 6.14 
Word Discrimination 1.83 2.81 42.43 14 .30 1.71 3.55 
Reading 1.77 2.52 22.34 .07 1.43 .29 6.90 
Arithmetic Computation 1.87 2.79 27.80 57 .00 1.04 4.50 
Verbal average 1.78 2.58 46.77 -13 AT 1.10 5.30 
Total 1,83 2.68 42.85 .09 .34 1.56 5.75 


Note. Data on the Arithmetic Reasoning subtest were not available for this sample. Abbreviation: 


MAT = Metropolitan Achievement Test. 


These data suggested the following conclu- 
sions: 

1. Inclusion of a squared score variable 
to permit curvilinear regression added little 
to the precision of the estimates, although 
it was slightly more influential in Grade 3 
than in Grade 2. 

2. Inclusion of pupil sex in the model 
added very little to predictive efficiency, 
even though girls significantly outperformed 
boys at these grade levels. The sex differ- 
ence was included in the prescores, however, 
so that its influence was negligible with pre- 
scores controlled. 

3. Systematic differences among the three 
years of testing were trivial. This was ex- 
pected, since there was no known reason to 
believe that any yearly differences in resid- 
ual gain would appear. 

4. Inclusion of the teacher variable usu- 


ally yielded a significant, and often a sub- 
stantial, increase in predictive efficiency. 
In short, teachers did make a difference, al- 
though pupil prescores were usually the 
strongest predictors by a considerable mar- 
gin. 

5. The influences of sex, year, and teacher 
appeared to be stronger in Title I than in 
non-Title I schools. Although the reasons 
for the year-effect differences were unclear, 
the sex- and teacher-effect differences were 
readily interpretable. Sex differences, in- 
cluding sex differences in school achieve- 
ment, are more extreme in lower- than 
in higher-socioeconomic-status populations 
(Hess, 1970), so that a greater sex effect 
was expected in the Title I schools. Simi- 
larly, the differential teacher effect was ex- 
pected on the basis of what is known about 
the relationship of ability and achievement 


TABLE 2 
INFLUENCES ON PREDICTION or STUDENT LEARNING GAIN MEASURES IN THE GRADE 2, 
NoN-TrrLE I SAMPLE 


M Influence (75) 
MAT subtest 

Pretest | Posttest | Pretest | Squared | Sex Year | Teacher 
Word Knowledge 2.50 3.71 61.01 .30 .08 .03 2.69 
Word Discrimination 2.82 3.70 58.11 ES E)! .04 2.95 
' Reading 2.51 3.62 53.17 .01 .01 .04 2.97 
Arithmetie Computation 2.52 3.18 29.93 .92 .16 .22 5.66 
Arithmetic Reasoning 2.52 3.25 40.12 .00 .00 07 4.22 
Verbal average 2.61 3.07 71.72 .24 -00 .03 2.98 
Quantitative average 2.52 3.22 42.30 .24 .03 .05 5.02 
Total 2.57 3.44 70.38 .00 .02 .00 3.04 


Note. Abbreviation: MAT = Metropolitan Achievement Test. 
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TABLE 3 


INFLUENCES ON PREDICTION OF STUDENT LEARNING GAIN MEASURES 


IN THE GRADE 3, 


TITLE I SAMPLE 


M Influence (92) 
MAT subtest 

Pretest | Posttest Pretest Samed Sex Year Teacher 
Word Knowledge 2.51 3.27 37.15 -90 .10 -33 17.64 
Word Discrimination 2.82 3.34 44.33 .35 .65 1.85 4.96 
Reading 2.62 3.25 34.36 .36 .21 .16 10.86 
Arithmetic Computation 2.76 3.44 29.89 .03 .41 1.18 9.34 
Arithmetie Reasoning 2.76 3.05 26.63 .86 .36 -36 4.04 
Verbal average 2.65 3.29 51.28 1.00 -16 .94 12.44 
Quantitative average 2.76 3.25 34.40 .14 AT 87 6.31 
Total 2.70 3.27 50.08 1.16 .41 1.44 9.83 


Note. Abbreviation: MAT = Metropolitan Achievement Test. 


differences to, socioeconomic status, At a 
given age level, the cognitive abilities and 
school achievement of lower-socioeconomic- 
status children are less advanced than 
those of their higher-socioeconomic-status 
peers. This can be seen in the present data 
by comparing (a) the means of Table 1 
versus those of Table 2 with (b) the means 
of Table 3 versus those of Table 4. Among 
other things, this means that the more ad- 
vantaged children were better able to learn 
on their own and/or from one another and 
thus were less dependent upon the teacher 
for their degree of success in mastering the 
curriculum—hence, the greater teacher ef- 
fect in Title I schools, 
6. Teacher impact appeared to be stronger 
on verbal skills than on quantitative skills 
in Title I schools and vice versa in non- 


Title I schools. This was probably an elab- - 
oration of the factor described in Paragraph 
5; socioeconomic-status differences are 
greatest on measures of verbal skills. 

7. Teacher impact was stronger in Grade 
3 than in Grade 2 in Title I schools but 
about equal in non-Title I schools. Again, 
this was probably an elaboration of the fac- 
tor described in Paragraph 5; by Grade 3, 
the socioeconomic-group differences, and 
thus the opportunity for teacher impact in 
Title I schools, were larger than they were 
in Grade 2 (note the larger differences in 
Grade 3 postscores than in Grade 2 post- 
Scores in Tables 2 and 4). 

8. Predictability of posttest scores of 
pupils was generally greater in Grade 3 
than in Grade 2, in non-Title I than in 
Title I schools, and on verbal than on 


TABLE 4 


INFLUENCES ON PREDICTION or STUDENT LEARNING Gain 


MEASURES IN THE GRADE 3, 


Now-TrrLE I SAMPLE 


M 
MAT subtest BI ————— 
Pretest. Posttest 
Word Knowledge 3.67 4.85 
Word Discrimination 3.66 4.62 
Reading 3.52 4.65 
Arithmetic Computation 3.14 4.13 
Arithmetic Reasoning 3.23 4.23 
Verbal average 3.62 4.71 
Quantitative average 3.19 4.18 
Total 3.40 4.44 


Influence (75) 
Pretest pin Sex Year Teacher 
64.98 .67 .00 .02 2.16 
67.08 .99 .03 .09 1.29 
57.09 1.47 .00 .15 1.41 
33.84 1.03 .07 .07 6.94 
50.55 .00 .02 .12 4.28 
76.04 1.16 .00 .08 1.08 
54.89 .14 .00 .12 5.23 
75.92 .39 .00 .12 2.16 


Note. Abbreviation: MAT — Metropolitan Achievement Test. 
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quantitative measures. The first and third 
conclusions have been discussed above. The 
greater predictability in non-Title I schools 
resulted from the higher correlations be- 
tween pre- and postscores in these schools. 
This, in turn, was due most likely to the 
combined effects of (a) the great variance 
in scores in non-Title I schools, which made 
for higher correlations when coefficients 
were not corrected for attenuation and (b) 
the more advanced verbal and math skills 
of children in the non-Title I schools, which 
enabled them to answer correctly more 
often and to guess less often than children 
in Title I schools, whose scores very likely 
contained more error variance due to guess- 
ing or other response sets. 


Consistency OF TEACHER Impact 


The next step of the analysis addressed 
the question of the degree to which individ- 
ual teachers’ influence on child gain was 
consistent across three successive years, and 
hence, across classes of pupils. 

Residual gain scores for all pupils were 
obtained, using only simple pretest scores as 
covariates. These were then averaged for 
each teacher for each of her three classes. 
These average residual gains were then 
used to compute intraclass correlations 
among the three years for each of the four 
samples of teachers. Intraclass correlations 
(Ebel, 1951) provided indices of the con- 
sistency of pupil gain within teachers, 
across classes of pupils. The results of this 
analysis are shown in Table 5. With the 
exception of the second grade, Title I 
sample, in which none of the coefficients 
were statistically significant, it was appar- 
ent that three-year averages* were reason- 
ably reliable estimates of teacher impact on 
student learning. 


CONCLUSIONS 


` The data show that reasonably stable 
estimates of teacher influence can be ob- 
tained from standardized achievement mea- 
sures of pupil performance when sample 


‘These intraclass coefficients (Ebel, 1951) con- 
cern three-year averages and are not averages of 
the three possible two-year correlations. 
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TABLE 5 
INTRACLASS CORRELATIONS ACROSS 
THREE YEARS 
Grade 2 Grade 3 
MAT subtest 
tte | Non- | ai B 
Title Title Title Tie 
Word Knowledge .43 | .66* | .78* | .63* 
Word Discrimination .36 | .74* | .26 | .49* 
Reading .24 | .66* | .50* | .23 
Arithmetic Computation| .00 | .48* | .63* | .80* 
Arithmetic Reasoning — | .61* | .27 | .64* 
Verbal average .95 | .71* | .65* | .38* 
Quantitative average | — | .59* | .50* | .75* 
Total .19 | .69* | .54* | .65* 
Note. Abbreviation: MAT = Metropolitan 


Achievement Test. 
*p< 05. 


selection procedures eliminate new teachers 
and teachers who have recently switched 
grades. The increasing use of the team ap- 
proach in elementary schools, however, les- 
sens the practical interest of such measures. 
Also, although the stability coefficients from 
this study were considerably higher than 
those located by Rosenshine (1970), they 
were not high enough to justify the use of 
residual gains on such measures for teacher 
accountability purposes (Brophy, 1973). 

The differences between Title I and non- 
Title I schools were consistent with the 
theoretical position that the school is rela- 
tively more important, compared to the 
home, in determining the achievement levels 
of economically disadvantaged pupils than 
it is for advantaged pupils. This, in turn, 
suggested that the quality of teaching is 
more crucial in such settings than in ad- 
vantaged schools, 

A subsample of these teachers, selected 
because they showed the greatest consist- 
ency across four years in the degree of stu- 
dent learning gains they produced, is pres- 
ently being studied in an effort to establish 
those personal traits and classroom behav- 
iors which are associated with teacher ef- 
fectiveness in producing learning gains 
(Brophy & Evertson, 1973; Evertson & Bro- 
phy, 1973; Peck & Veldman, 1973). Having 
established that teachers have differential 
impacts in determining student scores on 
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product measures, we are now attempting 
to identify the presage and process variables 
associated with these differences. 
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Achievement Test (SAT) sco: 


with the results previously 


The purpose of this paper is to report 
further longitudinal data from an investiga- 
tion of teacher-bias and teacher-expectancy 
effects on elementary school children’s Stan- 
ford Achievement Test (SAT) performance. 
In the earlier report (Dusek & O’Connell, 
1973), it was argued that rather than in- 
vestigate teacher-bias effects by manipulat- 
ing teachers’ expectancies for general intel- 
* lectual development (e.g. Claiborn, 1969; 
Rosenthal & Jacobson, 1968), a more mean- 
ingful approach would be to manipulate 
teachers’ expectancies in terms of children’s 
performance in specific academic subjects 
and then to measure children’s achievement 
in these areas. The value of measuring the 
relationship between teachers’ own self-gen- 
erated expectancies and children’s SAT per- 
formance was also pointed out in the original 
report. The rationale for this approach was 
based on research (e.g., Rist, 1970) indicat- 


*The project reported herein was performed 
pursuant to Grant OEG 2-71-0516 from the US. 
Office of Education, Department of Health, Edu- 
cation, and Welfare. The opinions expressed herein, 
however, do not necessarily reflect the position or 

; policy of the U.S. Office of Education, and no 
official endorsement by the U.S. Office of Education 
should be inferred. The authors are indebted to 
James McGee, principal, and the second-, third-, 
fourth-, and fifth-grade teachers of Clinton Ele- 
mentary School for their kind cooperation. 

* Requests for reprints should be sent to Jerome 
B. Dusek, Department of Psychology, Syracuse 

X University, Syracuse, New York 13210. 
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A FOLLOW-UP STUDY OF TEACHER EXPECTANCY EFFECTS 


anD RICHARD J. WHEELER 


Further longitudinal data were presented from a study of teacher-bias 
and teacher-expectancy effects on elementary school childrens’ 
achievement test performance. The analyses of the data, Stanford 
res from the beginning and middle of the 
1972-1973 academic year, revealed that the teacher-bias manipulation 
had no effect and that the teacher-expectancy manipulation was 
strongly related to SAT performance. Both findings were consistent 
reported. Since the children were at new 
grade levels and had new teachers, the data were interpreted as indicat- 
ing that teachers do not bias the learning of children but are good 
long-term predictors of children’s academic capabilities, 


ing teachers form their own expectancies re- 
garding students’ potential in terms of per- 
formance in academic subject areas. 

In order to investigate both teacher-bias 
and teacher-expectancy effects, Dusek and 
O'Connell (1973) administered the SATs, 
disguised as tests aimed at predicting aca- 
demic potential, to two second- and two 
fourth-grade classrooms, At the same time, 
the teachers were asked to rank the children 
in their room on the basis of expected year- 
end performance levels for language and 
arithmetic skills. This ranking was taken as 
a measure of teacher expectancy for stu- 
dents’ performance. The names of 8 of the 
first 16 students ranked by the teacher in 
each classroom were given to the teacher 
along with a statement indicating that this 
group should show large academic gains in 
language and arithmetic skills during the 
coming year. The remaining 8 children 
formed a control group. This was taken as a 
manipulation of teacher bias, that is, a mea- 
sure of expectancy effects based on the inter- 
vention of the principal investigator. 

The dependent variable was the children’s 
SAT performance at the beginning, middle, 
and end of the academic year. The findings 
were very clear. First, there was no evidence 
of a teacher-bias effect on SAT performance 
for any of the three testing occasions, Sec- 
ond, teacher-expectancy effects were very 
strongly related to SAT performance on all 
three testing occasions. Finally, teacher-bias 
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and teacher-expectancy effects did not inter- 
act. Clearly, simply telling teachers that 
students would perform well was not enough 
to alter the students’ SAT performance; this 
was a failure to replicate the findings of 
Rosenthal and Jacobson (1968). Teachers’ 
own judgments, however, were very good 
predictors of students’ performance. Based 
on the data in the original report, Dusek and 
O'Connell (1973) argued that these latter 
teacher-expectancy effects were, in all prob- 
ability, a reflection of the teachers’ ability 
to accurately estimate the academic po- 
tential of the children in their rooms rather 
than a bias effect in the Rosenthal and 
Jacobson (1968) sense. 

In the present paper, data bearing on the 
long-term effects of teacher bias and teacher 
expectancy are presented. At the beginning 
and middle of the 1972-1973 academic year, 
the SATs were again administered to the 
students, who were now in the third- and 
fifth-grades. The purpose was to determine 
if teacher-bias or teacher-expectancy effects 
would be present in SAT performance even 
though the students had now advanced to 
new grade levels and had new teachers. 


METHOD 


Subjects 


of the original 64 subjects, 38 were still available 
for testing during the 1972-1973 academic year, 
including 22 now in the third grade (mean chrono- 
logical age was 84 years) and 16 now in the fifth 


3In order to examine whether the 13 subjects 
lost from June 1972 to January 1973 were different 
from the 38 subjects remaining at the conclusion 
of the experiment, both the design variables and 
the criterion variables (SAT 1, 2, and 3 scores) 
were entered into a multiple discriminant analysis. 
The F ratio approximation to Wilk's lambda was 
124 (df — 10/40), too small to reject the multi- 
variate null hypothesis of group equivalence on the 
vector of design and criterion variables. In addi- 
tion, chi-square analyses indicated no differential 
loss of subjects as a function of sex of subject, 
teacher ranking, time of SAT administration, ex- 
perimental versus control condition, or grade level. 
These analyses clearly demonstrated no differential 
loss of subjects as a function of design variables. 
In addition, the SAT 1, SAT 2, and SAT 3 scores 
of the 13 subjects lost do not differ from the SAT 1, 
wed 2, and SAT 3 scores of the 38 subjects avail- 
able. 
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grade (mean chronological age was 10.0 years). A 
more complete description of the sample and the 
basis for its selection may be found in Dusek and - 
O'Connell (1973). 3 


Tests 


In September and January of the 1972-1973 . 
academic year, the following subtests from the - 
SAT Primary 2 and Partial Intermediate 1 bat- 
teries were administered to the third- and fifth- 
graders, respectively : Word Reading, Paragraph 
Meaning, Spelling, Arithmetic Computation, and 
‘Arithmetic Concepts. An alternate form of each 
test battery was administered on each testing oC- 
casion. As during the previous year, tests were dis- 
guised as measures of academic potential. 


Procedure 

The procedures were essentially the same asi 
those described by Dusek and O'Connell (1973). 
The third- and fifth-garde teachers were told that V 
the tests to be administered were developed to pre- 
dict academic potential in language and arithmetic 
skills and were being administered for standardi- 
zation and validation of the data collected the pre- 
vious year. Since the tests were administered under 
this guise and since the primary interest was in 
the long-term effects of teachers’ expectancies on 
performance, these teachers were not asked to 


rank the children. 


Design and Analysis 


There were three factors in the design: grade 4 
level, experimental condition, and teacher ranking” 
from the 1971-1972 academic year. Since 26 of the 
original 64 subjects were no longer available for 
testing, there was not an equal number of subjects 
in each cell of the design. For this reason, the dai 
were analyzed by the multiple regression technique 
described by Cohen (1968) and Overall and Spiegel 
(1969). The application of this technique to the 
present experiment was spelled out in detail in 


the initial report. 


RESULTS 


As in the previous report, the dependent 
variables were total SAT 4 and SAT 
scores, from the testing at the beginning and 
middle of the year, respectively. In addition, 
the SAT 1 total score was analyzed for 
38 subjects still available from the original 
sample. The results of the multiple regres 
sion analyses are summarized in Table 1. 
The means associated with the main effe 
in each analysis are presented in Table 2 
The analysis of the SAT 1 scores for the re- 
duced sample revealed effects parallel to 
original SAT 1 analysis, that is, the © 
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TEACHER EXPECTANCY 


TABLE 1 


MULTIPLE CORRELATIONS AND F RATIOS FOR 
Maın Errecrs 


Stanford Achievement Test 


Main effect 
1 4 5 
Experimental 
condition 

R? full -3958 .4229 .4598 

R? reduced .3958 .4213 .4537 

Difference s -0016 -0061 

Fe .0000 | «1.0000 | «1.0000 
Grade level 

R? full -3958 .4229 .4598 

R? reduced -3953 .3093 .4504 

Difference .0005 .1136 .0004 

Fe <1.0000 6.69* | «1.0000 
Teacher ranking 

R? full .3958 .4229 .4598 

R? reduced .0272 0714 .0105 

Difference .3686 .3516 .4493 

Fe 20.74** | 20.71** | 28.28** 
ES 00 0 SLE CCEMEUL 

a df = 1/34. 

*p< Ol. 

** p< .001. 


significant effect was teacher ranking. 
Teacher ranking was also a significant effect 
in the analyses of SAT 4 and SAT 5 scores. 


; In all three cases, students who were ranked 


higher by the teacher had higher SAT total 
scores than students who were ranked lower. 
The correlations between SAT scores and 
teacher ranking were —.63, —.55, and —.67 
for SAT 1, SAT 4, and SAT 5, respectively. 

Grade level was a significant effect only in 
the analysis of SAT 4. The fifth graders had 
a higher mean score than the third graders. 
The experimental condition produced no sig- 
nificant effects in any of the analyses. None 
of the interactions was significant. 


Discussion 


The results presented above substantiate 
the findings and interpretations of the previ- 
ous report in several ways. First, the 
teacher-bias manipulation had no effect on 
SAT 4 or SAT 5 performance. Taken to- 
gether, the findings in the two reports clearly 
indicate that simply telling teachers that 
students will perform well is not enough to 
alter students’ performance. These findings, 
as well as those of other investigators (e.g., 
Claiborn, 1969), are clear failures to repli- 
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cate Rosenthal and Jacobson’s (1968) re- 
search, and they argue that teachers do not 
bias either the intellectual development or 
achievement of children of elementary 
school age. 

Second, teacher ranking was strongly and 
consistently related to the children’s SAT 4 
and SAT 5 performance, just as it was to 
their SAT 1, SAT 2, and SAT 3 performance 
in the initial report. Clearly, the relationship 
between teacher ranking and SAT perform- 
ance is not a teacher-bias effect in the 
Rosenthal and Jacobson (1968) sense. First, 
as Dusek and O’Connell (1973) noted, it is 
unlikely that the relationship of teacher 
ranking to the SAT 1 scores would be as 
strong after only two weeks of classes if the 
rankings were, in large part, formed from 
bases irrelevant to the students’ academic 
ability. Second, during the academic year of 
SAT 4 and SAT 5 testing, the students were 
no longer under the tutelage of the teacher 
who made the rankings. If the rankings were 
a reflection of teacher bias, the strength of 
the relationship to SAT performance should 
have been dissipated. Obviously, this was 
not the case. 

Dusek and O'Connell (1973) argued that 
the effects associated with teacher ranking 
were simply a reflection of the teachers' ac- 


TABLE 2 


Mean Sranrorp AcurEvEMENT Test (SAT) 
Scores ror Each Main EFFECT 


M SAT score* 
Main effect. 
1 4 5 
Condition 
Experimental 59.16 75.53 84.21 
Control 54.26 68.58 81.11 
Grade 
"Third 58.77 65.09 84.91 
Fifth 53.88 81.03 79.56 
"Teacher ranking^ 
14 78.00 101.75 116.50 
5-8 57.88 67.13 86.50 
9-12 57.18 76.36 71.82 
13-16 39.91 49.73 60.09. 


See SE eee 
^ SAT 1 = October 1971, SAT 4 = September 
1972, and SAT 5 = January 1973. 
> Teacher ranking was entered as a continuous 
variable in the multiple regression analysis, The 
data are grouped here simply for convenience. 
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curacy at estimating the academic ability 
levels of the children in their classrooms. 
The longitudinal data presented above sup- 
port this hypothesis. What is not known, at 
present, are the bases teachers use to form 
expectancies for students’ performance. The 
data in the present paper suggest that 
teachers’ expectancies are based on criteria 
relevant to academic performance rather 
than on criteria related primarily to social 
class, as was reported by Rist (1970). If this 
is the case, teachers do not bias the educa- 
tion of children. 
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Y BLACK AND WHITE CHILDREN'S COMPREHENSION OF 
STANDARD AND NONSTANDARD ENGLISH PASSAGES 


SAMUEL J. MARWIT? ax» GAIL NEUMANN 
University of Missouri—St. Louis 


Two black and two white examiners administered standard English 
and nonstandard English forms of the Reading Comprehension section 
of the California Reading Test to 60 black and 53 white second 
graders. The hypotheses that black subjects comprehend nonstandard 
English materials better than those in standard English and that white 
subjects comprehend standard English materials better than those in 
nonstandard English were not supported. Within each form, white 
S subjects generally obtained higher scores than black subjects, and 

within each race, standard English presentations generally resulted 
in higher scores than nonstandard English presentations. Black sub- 
jects performed as well as white subjects under the white ex- 
aminer — standard English condition only. Results were discussed in 


relation to other studies of the linguistic interference hypothesis. 


Much of the current sociolinguistic litera- 
ture (Williams, 1970) has been concerned 
with the notion that black children, espe- 
cially those of low socioeconomic back- 
ground, speak a “nonstandard” form of 
English which differs from that used by 


lives. While some criticism has focused on 
the nonacceptance of spoken and written 
nonstandard English in the classroom 
(Johnson, 1971), much more has been di- 
rected against the nonavailability of 
“black-language” reading materials for 


^ their white peers. Ethnographical (Bauman, black students. Wolfram (1970) and others 
1971) and empirical (Marwit, Marwit, & (Baratz & Shuy, 1969) have suggested that 
Boswell, 1972) investigations have docu- the low level of reading skills traditionally 
mented the internal consistency of black displayed by black children results from 

i nonstandard English and the reliability and — thé linguistic interference produced by the 
predictability of its grammatical and phon- standard English format of texts and that 
ological rules. The general conclusion has reading proficiency, especially comprehen- 

been that black nonstandard English con- sion, would be greatly enhanced by trans- 
stitutes a different, rather than a deficient, lating standard English materials into non- 

== language in relation to standard English standard English. Support for the direct re- 
and that it needs to be accepted as such. lationship between reading comprehension 
Recently, attention has focused on the and similarity of written and oral language 
implications of these findings for the educa- patterns has been provided by Ruddell 

tion of the black child. Some (Saville, (1965). Yet, findings by Nolen (1972) and 
1971) contend that emotional and cognitive Torrey (1971) which directly addressed the 

a interference phenomena are operating un- black child's comprehension of written 
fairly upon the nonstandard, English- standard English have refuted this. The 

„p Speaking black child, since he is required to present study is designed to investigate this 


jk 


perform in a standard, English-oriented 
school setting and to compete with others 
who have used standard English all of their 


2 Requests for reprints should be sent to Samuel 
J. Marwit, Department of Psychology, University 
of Missouri, St. Louis, Missouri 63121. 


relationship further by having black and 
white examiners administer standard and 
nonstandard English forms of a formal 
reading comprehension test to an interracial 
population which had previously (Marwit 
et al, 1972) demonstrated sociolinguistic 
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differences. On the basis of earlier specula- 
tions (Marwit et al., 1972), it is hypothe- 
sized that black children obtain higher 
reading comprehension scores when the in- 
structions, test materials, and test questions 
are presented in a nonstandard English for- 
mat than when they are presented in a 
standard English format, that the reverse 
will be true for white children, and that no 
subject-race differences will occur among 
children reading passages written in their 
native oral dialect. 


MzrHOD 


Subjects 


Subjects were 60 black and 53 white second 
graders from a St. Louis County public school sys- 
tem in which significant subject-race differences 
in oral language had been previously demonstrated 
(Marwit et al, 1972). Two white and two black 
undergraduate males served as examiners, each 
testing one group of subjects using a standard 
English format and a second group using a non- 
standard English format. Random assignment of 
subjects within each race to each of the four con- 
ditions (black examiner — standard English, black 
examiner ~ nonstandard English, white examiner — 
Standard English, white examiner — nonstandard 
English) resulted in 15 black subjects per condition, 
18 white subjects per each of the first three con- 
ditions, and 14 white subjects in the last condition. 


Test Materials 


Test materials consisted of standard English and 
nonstandard English forms of the Reading Com- 
prehension section of the California Reading Test 
for first- and second-grade students? The non- 
standard English form was prepared by asking two 
black St. Louis-born teachers to translate the in- 
struments' instructions, reading passages, and test 
questions into the language of the black, St. Louis 
School child. Interinterpreter agreement was uni- 
formly high and confirmed expectations based 
mo earlier research findings (Marwit et al., 

An example of a standard English paragraph is, 
“Roy plays with the cow. The cow's Tine is 
Spotty. Nancy plays with the goat. Its name is 
Blacky. [p. 7].” The nonstandard translation reads, 
"Roy, he play with the cow. The cow name Spotty. 
Nancy, she play with the goat. Its name Blacky.” 


* Adapted from the California Reading Test, 
Lower Primary, Form X, devised by Emest W. 
Tiegs and Willis W. Clark. Copyright © 1957, 
1963 by McGraw-Hill, Inc. By permission of the 
publisher, CTB/McGraw-Hill, Del Monte Re- 
search Park, Monterey, California 93940. All 
Rights Reserved. Printed in U.S.A. 
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TABLE 1 
Mean READING COMPREHENSION Scores 
Format 
2 5 S 
Subject Examiner Sfandard Nenstaadard 
M SD M SD 
Black 
Black 4.47 | 2.97 | 4.60 | 2.80 © 
White 8.47 | 4.26 | 4.47 | 2.50 
White 
Black 9.54 | 4.10 | 7.15 | 4.34 
White | 8.31 | 4.59 | 8.57 | 3.99 
RESULTS 


Table 1 presents the mean reading com- 
prehension scores for black and white sub- 
jects tested by black and white examiners ` 
using standard and nonstandard English 
formats. A 2 x 2 x 2 (Subject Race X 
Examiner Race X Format) analysis of var- 
iance resulted in three significant effects: a 
main effect of subject race (F = 16.95, df 
= 1/105, p < .001) with white subjects 
receiving higher comprehension scores than 
black subjects; a main effect of format (F 
= 4.47, df = 1/105, p < .05) with subjects 
achieving higher scores under standard 
English than under nonstandard English 
format; and a triple interaction (F = 5.79, 
df = 1/105, p < .05) produced by black 
subjects obtaining significantly higher 
scores under the standard English format 
when tested by white examiners than when; 
tested by black examiners (F = 8.62; F'(.os) 
= 834; Sheffé, 1953), while no compa-: 
rable differences occurred for black subjects 
under nonstandard English format or for 
white subjects under either format. 


DiscussioN 


The present results failed to support the 
hypotheses. Black second graders did not 
comprehend reading materials better when 
they were presented in a nonstandard Eng-' 
lish format as opposed to a standard Eng- 
lish format nor did their white peers com- 
prehend standard English significantly bet- 
ter than nonstandard English. Furthermore, 
subject-race differences were generally 
maintained despite changes in format. 
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* These findings refute the contention of 


those who hold that linguistic interference 
between written and spoken language is re- 
sponsible for most of the reading difficulties 
encountered by black children (Baratz & 
Shuy, 1969) and suggest that alternative 
explanations be explored. 

While failing to support the present hy- 
potheses, these results were not unsupported 
by others in the sociolinguistic and educa- 
tional literature. They were confirmed by 
the findings of a number of investigations 
dealing with the black child’s ability to 
comprehend both written (Nolen, 1972; 
Torrey, 1971) and spoken (Hall, Turner, & 
Russell, 1973) standard English. In each of 
these investigations, the presence of a relia- 
ble, nonstandard English dialect as the 
black child’s oral language pattern was 
documented, as was the failure of this oral 
dialect to interfere with the comprehension 
of standard English. Where white subjects 
were employed, similar nonsignificant Sub- 
ject Race X Format interactions were ob- 
tained, These findings, taken together with 
those of the present study, suggest that 
either linguistic discrepancies due to subject 
race are not pronounced enough to impair 
comprehension or are simply unimportant 
to the process of understanding. It has yet 
to be demonstrated that direct oral-oral or 
oral-visual correspondence is essential for 
comprehension of verbal and written mes- 
sages among speakers whose basic vocabu- 
laries and sentence structures are roughly 
equivalent. 

The significant main effects of subject 
race and of format were not by themselves 
relevant to the present hypotheses. The 
means comprising the two-way interaction 
of these variables, while not significantly 
different from each other, were interesting. 
Within each format, white subjects ob- 
tained higher comprehension scores than 
black subjects. This was contrary to the 
hypotheses and has already been discussed. 
Within each subject race, higher scores were 
elicited by the standard English format 
than by the nonstandard English format, 
with the discrepancy being more pro- 
nounced among black subjects. It is not 
surprising that white children performed 


slightly better when exposed to standard 
English, since this is the language they 
have heard and spoken all their lives. How- 
ever, black children’s achievement of higher 
test scores under standard English condi- 
tions was unexpected; this might be a func- 
tion of either their familiarity with stand- 
ard English as the expected and accepted 
language of the classrooms and therefore 
the one which demanded performance or, 
conversely, a function of their distrust of 
nonstandard English in a setting where it 
was rarely, if ever, used and almost never 
rewarded (Johnson, 1971). Observation of 
the means (Table 1) entering into the Sub- 
ject Race X Examiner Race X Format in- 
teraction provides further support for either 
of these interpretations. This interaction 
was the result of black subjects performing 
as well as white subjects under the single 
circumstance of a white authority figure us- 
ing standard English. If it can be assumed 
that the results of this condition were not 
due to chance, then it is reasonable to sug- 
gest that black second graders can compre- 
hend written material as well as white sec- 
ond graders but manifest this only under 
specified conditions. Research designed to 
investigate the facilitating and suppressing 
effects of various subject race — examiner 
race combinations and of various expect- 
ancy, confirmation — disconfirmation cir- 
cumstances is being prepared and will hope- 
fully provide more information. 
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VARIABILITY IN CHILDREN'S COMPREHENSION OF 
SYNTACTIC STRUCTURES' 


ALAN M. LESGOLD* 
University of Pittsburgh 


Data are presented that challenge the difficulty ordering for anaphoric 
syntax (eg. pronouns) proposed by Bormuth, Manning, Carr, and 
Pearson. It is suggested that any such difficulty ordering which re- 
sults from tests of the form proposed by Bormuth has uncontrolled 
variability due to semantic factors that have yet to be carefully 


analyzed and controlled. 


Bormuth, Manning, Carr, and Pearson 
(1970) have reported results suggesting 
that stable difficulty orderings could be ob- 
tained for three classes of syntactic forms 
containing a total of 55 separate variants. 
There were difficulty differences both be- 
tween and within the three classes (intra- 
sentence, intersentence, and anaphora?), 
from which they argued that these cate- 
gories may be related to stages in a learning 
hierarchy for syntax comprehension skills. 
Further, there have been recent suggestions 
that additional experimentation of this sort, 
combined with linguistic analysis, will re- 
sult in discovery of such a hierarchy (e.g. 
Carroll, 1972; Frase, 1972). The present re- 
sults—a partial replication of their work— 
suggest that this approach is unlikely to 
succeed unless such factors as semantics 
and constraints on information-processing 
capacity are concurrently considered. 

The procedure Bormuth et al. (1970) 
used in determining the difficulty of the 
various syntax forms was simple and, at 


1 This research was supported by the Learning 
Research and Development Center, University of 
Pittsburgh, through funds from the National In- 
Stitute of Education. The experiment was con- 
ducted by Hildrene De Good. Karen Block, Ro- 
berta Golinkoff, and Charles Perfetti read an ear- 
lier draft and helped in improving it. 

* Requests for reprints should be sent to Alan 
M. Lesgold, LRDC, University of Pittsburgh, 
Pittsburgh, Pennsylvania 15260. 

* Anaphora is the term used to denote a struc- 
ture in a sentence, for example, a pronoun, that 
derives its meaning from a previous sentence or an 
earlier part of the present sentence. 


first glance, straightforward. For a given 
anaphora form, for example, one sentence 
was written and then a second was con- 
structed with some reference back to the 
first (e.g, John went to the store. He 
bought a pear.) . A paragraph was then con- 
structed around the two-sentence cluster. A 
question was generated for the paragraph 
by substituting a wh- word (Bormuth, 
1970) for the anaphora (e.g., Who bought a 
pear?). Finally (for anaphora only), multi- 
ple-choice alternative answers were written. 
The item thus consisted of a paragraph, a 
question, and a number of alternative an- 
swers from which to choose. 

There is à basie problem in using proce- 
dures of this sort for measuring the diffi- 
eulty of one syntactic structure relative to 
another. This is the confounding of syntax 
with semantics, Consider a potential com- 
parison between a personal pronoun struc- 
ture (e.g., he) and a pro-clause form (in 
which that or so might stand for an entire 
clause). The two syntactic forms occur in 
different semantic contexts: there is no se- 
mantic (deep) structure that can alterna- 
tively be expressed as either a personal 
pronoun or a pro-clause. Now this does not 
necessarily mean that a difficulty ordering 
such as that of Bormuth et al. should be 
discounted. After all, the confounding of 
syntax and semantics may, for such pur- 
poses, be complete. 

If the Bormuth et al. (1970) difficulty 
ordering is consistently replicable, then we 
can still look for a learning hierarchy re- 
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lated to that data, even though we would 
not know, initially, the relative roles of 
syntax and semantics. On the other hand, it 
may be that the difficulty ordering is not 
stable, that the particular confoundings of 
syntax and semantics in the item forms of 
Bormuth et al. are only partial. Then, if 
semantic content is not controlled, spurious 
measurements of syntax-processing ability 
may result. If the latter is the case, then a 
difficulty ordering is premature until a se- 
mantic analysis as complete as Bormuth’s 
(1970) syntactic analysis is available. The 
present results are à demonstration that the 
difficulty ordering of syntactic structures is 
not stable, implying that such further anal- 
yses are necessary. 

There is another problem in measuring 
syntax difficulty. Sometimes the answer toa 
question may be betrayed by semantic con- 
straints. The paragraph may contain only 
one semantically possible answer to the 
question. For example, a who question after 
a paragraph with only one animate noun 
could be answered without knowledge of the 
target’s syntactic structure. Such an ex- 
treme problem is not likely in a careful 
study such as that of Bormuth et al. More 
generally, though, semantic constraints on 
the answer to a question by the choice of 
content words for the paragraph are diffi- 
cult to determine and may not be com- 
pletely controlled in studies such as that of 
Bormuth et al. or the present experiment. 
Again, this may be a moot point if such 
constraints are perfectly correlated with 
syntax, but the present results rule out this 
possibility. 

In addition to these two—semantic diffi- 
culty and extent to which the paragraph 
“gives away” the answer—there is a third 
potential source of variance in the difficulty 
ordering produced by the Bormuth et al. 

(1970) method. Two passages that have a 
target syntactic structure in common may 
differ in the extent to which they can other- 
wise be processed to the point at which the 
critical structure is relevant. One may sen- 
sibly hypothesize, for example, that an ana- 
phora cannot be comprehended unless both 
it and its antecedent are simultaneously in 
operating (short-term) memory. 
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Consider the following example sen- | 
tences: 


1. Joe may cry. If so, the rest of us will 
be sad. 

2. Joe is coming home. That is the best 
news I've heard all week. 

3. Joe may splice the mainbrace. If so, 
the rest of us will be glad. 


Bormuth et al. found that passages like 
Item 1 are comprehended 87% of the time 
while those like Item 2 are comprehended 
only 66% of the time, by fourth graders. If 
he had used a different pro-clause passage - 
like Item 3 instead of Item 1, perhaps only 
half of the children would have understood — 
pro-clause forms, thus reversing the ranking 
of clause demonstratives (that) and pro- 
clauses (so). Item 3 is longer than Item 1 
and may be more difficult to encode due to 
its idiomatic content. Most important, so 
stands in place of a much more complex 
construction in Item 3 than in Item 1. Un- 
less a child can record splice the mainbrace 
as a single image, he may lack the ability 
to process that phrase to the point of being 
able to resolve the anaphoric reference. No 
control procedure is available to insure that 
Bormuth’s ranking arises only from syntax 
differences and not from differences in pas- 
sage wording or the amount of processing 
required to get from a syntactic parsing of 
a sentence to an underlying cognitive repre- 
sentation. 

The present results arise from what was 
expected to be a screening task for an ex- 
periment on memory for syntax. Thus, they 
provide information about only a subset of 
9 out of Bormuth et al's 14 anaphora 
forms. The differences between the two 
studies are as follows: (a) the present 
study used oral, constructed responses while 
Bormuth et al. used written, multiple- 
choices responses; (b) the present study ex- 
plicitly controlled the number of semanti- 
cally plausible potential answers in each 
passage; (c) the location of the target 
structure in the passage was counterbalanced 
in the present study; and (d) Bormuth et 
al. used 420 fourth-grade subjects while the 
present study used 80 subjects from the 
third and fourth grades. 
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CHILDREN'S COMPREHENSION OF SYNTACTIC STRUCTURES 


METHOD 


Subjects 


Forty students from a campus laboratory school 
and 40 students from an urban public school par- 
ticipated as subjects. The campus group ranged in 
age from 8.0 to 10.0 years with a median age of 8.8 
years, while the urban group ranged from 8.0 to 
115 years with a median age of 9.4 years. All were 
in third or fourth grade 


Materials 


Fourteen anaphora forms were tested in this 
study. They are listed, with examples, in Table 1. 
Three items were written for each form. An item 
consisted of a paragraph plus a question. The para- 
graph, in turn, consisted of two filler sentences plus 
a two-sentence critical-structure sequence. The first 
sentence of the critical sequence contained an ante- 
cedent which was referenced by an anaphora in the 
second sentence. Each of the three items for a 
form had a different location for the critical sen- 
tences in the paragraph: before, between, or after 
the filler sentences. Hach item was constructed so 
that there were two semantically sensible answers 
to the question. The correct choice was determined 
by the anaphora syntax. The question for each 
item was written by substituting the appropriate 
wh- word for the anaphora and then applying the 
shortest sequence of transformations that would 
turn the anaphora sentence into a question. 

Each question was typed on & 8% inch X 11 
inch sheet in 42-centimeter gothic type and cov- 
ered with a clear plastic sleeve. The 42 pages were 
presented to subjects in a loose-leaf binder. Order 
of occurrence of the items was approximately 
counterbalanced. 


Procedure 


The procedure was first explained to the sub- 
ject. Each subject worked individually and at his 
own pace reading each paragraph and then orally 
answering the question. The subjects from each 
school were split into two groups of 20 each. One 
(generally silently), while 
the other group followed along reading (silently) 
as they listened to a tape recording of the passage. 
After completing 30 of the 42 passages, each sub- 


by he?). 


Each response to a passage by a given subject 
was punched onto a separate computer card. A list 


“The campus laboratory school actually had 
two-year groupings, one of which was equivalent 
to third and fourth grades. 
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was then produced, sorted by item. This made it 
possible to examine all answers to an item at once. 
Each answer was scored as correct only if the ref- 
erent of the anaphora in the given passage Was 
stated or implied by a stated synonym or super- 
ordinate category term, Answers referring to more 
entities than the correct referent were counted 
wrong. Scoring was verified by a second observer. 


RESULTS 


Two reliability measures were computed. 
The cell means for the 14 anaphora types 
were compared for the two schools (r = .94, 
df = 12, p < .001) and for the two presen- 
tation conditions (r = .88, df = 12, p < 
001) ; both correlations are quite high, sug- 
gesting that these results are relatively sta- 
ble for the particular items used. 

To determine whether the anaphora 
forms differed in their difficulty, a 20 X 14 
x 2 x 2 (Subjects X Anaphora X Schools 
x Presentation Mode) partially nested 
analysis of variance was performed on the 
scores (0-3 correct) of the subjects on each 
of the anaphora forms. There were signifi- 
cant differences among the anaphora forms 
(F = 19.6, df = 13/988, p < 0001), but 
there were no interactions of anaphora with 
school or presentation mode (Fs < 1.28, df 
= 13/988). The percentage correct for each 
of the anaphora types is shown in Table 1, 
and Tukey’s (a) post hoe comparisons are 
shown in Table 2. 

Schools was a significant variable (F = 
27.5, df = 1/76, p < 0001) with the cam- 
pus lab school (M = 2.42) showing higher 
performance than the urban public school 
(M = 2.04). There is also some reason to 
suspect that while the urban school subjects 
performed equally under the two presenta- 
tion conditions (Ms = 2.03, 2.04), the cam- 
pus lab school subjects benefited from the 
read-and-hear condition (Ms = 2.54, 2.29; 
F = 289, df = 2/76, p < -10). 

The important result is the comparison of 
the rank orders of difficulty for those nine 
anaphora conditions common to both the 
present study and to that of Bormuth et al. 
(1970). The mean percentages correct can 
be seen in Table 1. There is a significant 
negative correlation between the two sets of 
means (Spearman p = .66, n = 9, p < 05). 


TABLE 1 
RELATIVE DIFFICULTY OF ÁNAPHORIC STRUCTURES 


Anaphora type 


Personal pronoun Joe left the room. He had .... 64.5 > 

Relative pronoun man who lives next door. ... 82.8 d 

Noun phrase, demonstrative | The old dog belongs to Joe. That is his... . 81.5 i 

Negated pronoun The gang went shopping. No one bought ....| 67.4 | 67.5 
Pro-verb, so-do John likes Mary. So does Bill. 82.8 | 60.8 
Pro-verb, so-be/have Joe is sick. So is Bill. 76.1 | 54.6 
Pro-adverb, locative I am upstairs. It is cold there. — 79.2 
Pro-clause, so Joe may go. If so, we will.... 86.8 | 64.2 
Clause, demonstrative Joe is dead. That leaves two of us. 60.3 | 90.0 
Pro-adjective John is careful. Bill is that way, too. — 66.2 
Semantic substitute—noun | Those dishes are expensive, but this china... .| 65.5 | 82.9 
Semantic substitute—verb Jim shot first. John fired, too. — 82.5 
Semantic substitute—clause | Bill went to the bank. That trip made us short — 75.0 

one outfielder. 
Semantic substitute—adverb | John works carefully. Bill also works precisely. c 70.8 


* Bormuth, Manning, Carr, and Pearson (1970). 


Discussion 


Since Bormuth et al. (1970) suggest that 
their difficulty ordering may be the basis for 
a learning hierarchy for anaphora syntax, it 
is important to be certain that their data 
and the present data cannot possibly be a 
basis for the same hierarchy. The negative 
rank correlation gives some evidence in this 
regard. However, it would be possible to 
find such a correlation if one subject sample 
possessed the skills needed to comprehend 
Structure A while a second subject sample 
had not yet acquired those skills. If both 
samples performed at a middle level on 
Structure B, then the first would display a 
rank ordering of A > B while the second 
had ordering B > A. 

Many structures of language can only be 
understood in all contexts after a sei of 
several interpretation rules have been 
learned. 

Consider the following example sen- 
tences: 


4. He had won the race. 
5. John knew he had won the race. 


For example, a six-year-old knows enough 
of the rules for understanding personal 
pronouns to know that in Item 4, he refers 
to the person most recently talked about. 
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Example 


However, the child would assume that Item 
5 referred to two different people (Chom- 
sky, 1969; Palermo & Molfese, 1972). 
When two syntactic forms are tested with- 
out knowledge of the rules or strategies that 
different-age children use to comprehei 
them, it is quite conceivable that their re 
tive difficulty will oscillate as new rules 
learned that enable one structure or 
other to be understood in more contexts. 
This is not the case for the present dat 
since some forms, such as the two pro-verb 
forms and the pro-clause (so) form, are 
better comprehended by the Bormuth et al. 
(1970) subjects, while others, such as the 
personal pronoun, are better comprehended 4 
by the present subjects. Since these order 
differences are absolute rather than rela- 
tive, we conclude that the Bormuth et al. 
procedure suffers from a problem more 
basic than developmental variability. 
The present study was intended as 
Screening task for an experiment derived 
from Bormuth's work. Thus, every effort Jj 
was made to reproduce any potentially erit- 
ical procedures of Bormuth's. Results ob- 
tained are statistically reliable. We con- 
clude that syntax is not the basis of a hier- 
archy of comprehension skills that have not 
completely acquired by the time cl 
dren are in the fourth grade. 


g hh 


Anaphora type 


. Pro-verb, so-be/have 

. Pro-verb, so-do 

. Pro-clause, so 

. Pro-adjective 

Negated pronoun 

Semantic substitute—adverb 
Semantic substitute—clause 
Noun phrase, demonstrative 
. Relative pronoun 

10. Pro-adverb, locative 

11. Semantic substitute—verb 
12. Semantic substitute—noun 
13. Clause, demonstrative 


CONOR WN t 


* p < 05. 
** p « 01. 


Bormuth et al. (1970) claimed to have 
less confidence in their anaphora results be- 
cause of the multiple-choice procedure used 
exclusively in that part of their study. 
Thus, one could argue that what the present 
paper has really done is to correct a prob- 
lem in Bormuth's procedure (Bormuth, per- 
sonal communication, October 1973), thus 
giving a more correct syntax-learning hier- 
archy. However, there are very few syntax 
forms which children in the fourth grade 
cannot understand in at least some con- 
texts. Several of the “hardest” forms in the 
Bormuth et al study are “easy” in the 
present data. 

Why, then, do children perform poorly on 
a particular structure on even one of the 
two tests. There are two potential reasons: 
(a) the child may not know the 
interpretation rules required to understand 
the structure in a particular semantie con- 
text; or (b) he may lack the real-time ca- 
pacity for applying those rules. There is 
good reason to believe that both of these 
factors play a role but that the second is of 
predominant importance for children as old 
' as the present subjects. The available evi- 
dence is primarily restricted to studies of 
pronominalization. 

Fredrick, Golub, and Johnson (1970) 
have found that multiple-choice pronoun 
items similar to those of Bormuth et al. 
(1970) are correctly answered only 28% to 
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TABLE 2 
Post Hoc COMPARISONS BETWEEN ANAPHORIC STRUCTURES (ToxEY's 4) 


PD ll PT 
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50% of the time. The items are grammati- 
cally similar to the personal and relative 
pronoun items of the present study, which 
are correctly answered 92% and 78% of the 
time, respectively. The apparent difference 
is in the level of semantic ambiguity in the 
two cases. Consider the following examples, 
from Fredrick et al. and the present study, 
respectively: 


6. The notebook on her desk covered up 
my drawing which was very messy. 
[Identify which). 

(a) notebook 

(b) desk 

(c) covered up 
(d) drawing 
(e) messy 


7. Two men were walking down the street. — 
One man had on a hat. The big man who 
was standing on the corner is my father. 
Who was standing on the corner? 


The first example (Item 6) is semantically 
more complicated and requires that the 
child realize that the “closest-semantically- 
acceptable-antecedent” rule applies, even 
though there are three semantically accept- 
able potential antecedents, while there are 
only two potential answers to Item 7. Even 
children who get Items 6 and 7 correct are 
quite likely to waver in handling Item 8. 
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8. I put the package on the table. Because 
it was tilted, it fell off. [Identify it.] 
(a) package 
(b) table 
(c) tilted 
(d) fell 
(e) I 


The point is that grammatical rules for 
nine-year-old children are not abstract 
Structures that apply mechanically. They 
are inextricably bound up with semantics 
(cf. Palermo & Molfese, 1972). The pres- 
ence of some pronoun interpretation rules in 
even five-year-olds (Chomsky, 1969) does 
not mean that these rules operate ab- 
stractly and free of semantic influence, The 
potential reliability and validity of syntax 
tests with uncontrolled semantics is low. 

The second potential reason for variabil- 
ity in tests of syntactic competence is dif- 
ferences in the extent to which a given item 
exceeds or stays within the child's channel 
capacity as an information processor. Re- 
cent findings by the present author (Les- 
gold, 1972) suggest that adult-like compre- 
hension of personal-pronoun sentences is 
more likely when a child can replace cum- 
bersome surface-structure segments with 
imaginal codes. Thus, imagery factors ap- 
parently play a role in comprehension per- 
formance. 
These arguments Suggest that the design 
of syntax comprehension curriculum cannot 
readily be based upon tests of the sort de- 
scribed here and by Bormuth et al. (1970). 
Such tests have uncontrolled Variance due 
to imagery and semantic factors. The struc- 
tures tested are often (but not always, cf. 
Palermo & Molfese, 1972) understood by 
children when presented in circumstances 
that have relatively simple semantics. On 
the other hand, sufficiently complex seman- 
ties will probably override knowledge of 
any structure in young children, Perhaps, 
Semantic analysis at the level of care that 
Bormuth (1970) has brought to syntactic 
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analysis will be useful in the design of in- 
struction. However, at the present, the use 
of difficulty orderings for syntax without 
regard to semantics is not likely to lead to 
improved instruction in comprehension, 

It is certainly possible that abstract syn 
tax rules are being acquired during the age: 
from which Bormuth’s and the present sam- 
ples were drawn. Further, it is conceivable 
that one might want to test for the presence } 
of such rules. This could be done by using 
nonsense words instead of lexical words in 
the various item forms used in this study, 
This would yield measures of “pure” syntax 
ability. However, many syntactic rules in- 
teract with semantics, so this pure ability 
might not be very relevant. 
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—- The long-standing interest among psy- 
 chologists in “individual differences" has 
more recently led to a great deal of specula- 
tion and research on reliable patterns of 
interaction among learner, task, and treat- 
ment variables in order to be able to individ- 
ualize instruction. Cronbach (1957) first 
- gave the name “aptitude-treatment interac- 
tion” to such patterns. A basic assumption 
underlying the aptitude-treatment-interac- 
tion concept is that individuals are indeed 
meaningfully different and that with suffi- 
cient conceptual and methodological tools 
these differences can be significantly uti- 
lized.in the design of instruction. The apti- 
tude-treatment-interaction believer rejects 
the notion that there is one best instruc- 
tional procedure for teaching all individu- 
als. 

There are strong arguments in support of 
significant individual differences from 
many avenues of research, ranging from be- 
havioral genetics to psychology (McClearn 

Meredith, 1966; Newell & Simon, 1972). 
"This position does not deny that there are 
demonstrable similarities or seemingly uni- 
versal characteristics among individuals 
but rather that, in addition, individuals 


1 Requests for reprints should be sent to John 
E. Rhetts, Educational Psychology, 330 Burton 
Hall, University of Minnesota, Minneapolis, Min- 
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TASK, LEARNER, AND TREATMENT VARIABLES IN 
INSTRUCTIONAL DESIGN 


JOHN E. RHETTS* 


University of Minnesota 


'This research studied the effects of three variables (learner, task, 
and treatment characteristics) on subject performance in two basic 
learning paradigms (visual discrimination or “matching” and paired- 
associate tasks). Significant interactions were found for number of 
errors on both tasks: Learner Characteristic X Task Characteristic 
on the matching task and Learner Characteristic X Treatment Mode 
on the paired-associate task. Response latencies and postresponse 
feedback examination times were analyzed, and generally, both were 
inversely proportional to the number of errors, Results support the 
need for instructional research to be based on the Attribute X 
Treatment interaction model and to employ a “task-first” approach. 


vary in many potentially significant ways. 
The pressing question is to identify and 
verify patterns of characteristics which are 
both (a) consistently ascribable to or mani- 
fested by an individual and (b) related to 
performance differences on some task(s). In 
such an endeavor, we must attend to the 
fact that the existence of such patterns is 
the product of an interaction: one of the 
organism with the environment. Psycholo- 
gists have long differed over which side of 
the interaction merits most attention, de- 
bating whether or not the “equipment” the 
individual carries with him exerts primacy 
over the structure and content of the task 
environment. However, the more poten- 
tially informative pursuit would be rigorous 
work on the following questions: 


1. For a given task, what identifiable and 
replicable patterns of characteristics among 
individuals are associated with significantly 
different levels of performance on the task? 

2. Are such patterns of individual-differ- 
ence characteristies involved in aptitude- 
treatment interactions across a variety of 
tasks or are they task specific? 

3. Are performance differences eradicable 
(partially or completely) by changing or 
modifying the task environment? 


This set of interrelated questions implies 
an interaction model. Ample support for 
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this general approach can be found in re- 
search from all the behavioral and social 
sciences. Vale and Vale (1969) succinctly 
state three compelling reasons why psychol- 
ogists must seriously study human perform- 
ance from an interaction approach: 


1. Decades of debate over the so-called 
nature-nurture controversy have clearly re- 
vealed that neither variable can be ignored; 
interaction studies are more likely to reveal 
important findings than attempts at hair- 
splitting over whether organism or environ- 
ment alone accounts for more of the vari- 
ance, 

2. Psychologists are becoming more in- 
terested in the study of processes within 
individuals and the interface of these infor- 
mation-processing capacities with the task 
at hand (eg. Cronbach, 1967; Jenkins, 
1967; Newell & Simon, 1972). 

3. Only by incorporating individual-dif- 
ference characteristics as positive elements 
(as opposed to “error variance”) can psy- 
chology build adequate general laws about 
human behavior. Psychologists cannot af- 
ford to ignore the demonstrable genetic and 
behavioral differences among individuals 
and cannot, therefore, search only for gen- 
eral, invariant functional relations among 
variables. 


Cronbach and Snow (1969) exhaustively 
reviewed hundreds of published aptitude- 
treatment-interaction (ATI) studies and 
concluded that 


Progress toward the goal of identifying and under- 
standing ATI has been slight.... There are no 
solidly established ATI relations even on a labora- 
tory scale, and no real sign of any hypothesis ready 
for application and development [p. 193]. 


Bracht (1970) appears basically to - 
eur with these findings: a. 


Experimenters usually identified ali i " 
ments and then through tris AME. "ied a 
find personological variables to interact with treat- 
ments. The analysis of an interaction effect was 
often an afterthought rather than a carefully 
planned part of the experiment, i.e. the alternative 
treatments were not developed with the ATI con- 
cept in mind. This approach has not been Success- 
ful for finding meaningful . . . interactions [p. 639]. 


Rhetts (1972) has proposed a research 
strategy designed to improve the informa- 
tion yield from aptitude-treatment-interac- 


RHETTS D 
tion studies. First, experimenters must di 
tinguish among three aspects of the apti: 
tude-treatment-interaction design: ta 
characteristics, learner characteristics, an 
mode of presentation or instruction (treat 
ment) variables. Analysis of publishe 
studies (e.g, Bracht, 1970) reveals tha 
aptitude-treatment-interaction researchers 
have generally conceptualized the aptitude- 
treatment-interaction problem, as two di- 
mensional: learner variables and a diffuse | 
combination of task and treatment varia- 
bles. However, both treatment and task 
characteristics can be expected potentially | 
to enter (singly and jointly) into interac- 
tion with learner characteristics, thus : 
strongly suggesting that a three-dimen- 
sional reference scheme should be used. 

A second recommendation (Rhetts, 1972) 
centered upon the sequence of experimental 
design, the optimum sequence being as fol- 
lows: first, concentrate on an analysis of | 
the characteristics of the task (and its de- 
mands on memory, knowledge, motivation, 
abilities, etc.); second, identify plausible 
individual-difference characteristics related 
to the performance demands of the task; 
and third, develop different treatments or 
modes of presentation (different instruc- 
tions, cues, amount or distribution of prac- 
tice, available information or materials, 
ete.) designed to influence performance dif- 
ferences. One reason for adopting a “taske 
first” sequence is the fact that one of the 
outstanding features of the human 
nism is its adaptive capability. “hig 
that the individual can and does modify his 
behavior contingent upon the situation at 
hand, by tailoring his own responses to the 
nature of the task environment confronting 
him. Thus, while it may be true that an 
individual manifests a characteristic style 
or strategy in dealing with some class of 
tasks or problems, we do not really kno 
enough until we know whether this pa 
of similar behavior is (a) a response to 
common stimulus aspects among the tasks 
or (b) an overlearned behavior which is 
generalized (nonadaptively) to the class of 
tasks. If a is the case, we will also need to 
know whether the nominal cues or stimuli’ 
are effective (salient) for the individual or 
not and whether the effective stimuli are or © 


are not relevant to solving the problem or 
| task. By initially attacking the aptitude- 
treatment-interaction problem at the 
learner-characteristics dimension, the ex- 
perimenter cannot know (a) what to attrib- 
jute either the consistency in an individual's 
behavior or the commonality of behavior 
| among a group of individuals to and (b) 
whether behavioral consistency or common- 
| ality is functionally adaptive or nonadap- 
/ tive (e.g, perfunctorily repetitive). Work 
J| by others (e.g, Garner, 1970; Jenkins, 
1967; Newell & Simon, 1972) supports a 
task-first approach. 
| A second reason for attacking the apti- 
| tude-treatment-interaction problem in a 
| task-first sequence is that documented apti- 
tude-treatment interactions could signifi- 
| cantly influence and inform the process of 
instructional design. Such problems begin 
with the question “what is to be taught” or 
“what task is the learner to master?” Thus, 
in this context, it seems inappropriate to 
begin by asking what learner characteris- 
tics or presentational variables are relevant 
prior to defining the “what” of the instruc- 
tional problem. 
) The following research was an effort to 
| investigate interactive effects among 
learner, task, and treatment, variables. The 
two tasks used were familiar laboratory 
ones—a visual discrimination or “match- 
ing" paradigm and a  paired-associate 
learning paradigm—each with items at two 
difficulty levels. Two tasks were chosen in 
order to explore the degree to which per- 
formance effects due to impulsivity-reflec- 
tivity were either task specific or more gen- 
eralized. These two particular tasks were 
chosen both because hey pror as bgn 
ients in more complex classroom - 
- ultimate locus of interest—and be- 
cause an analysis had suggested that their 
performance demands should differentially 
affect subjects\with different information- 
rocessing habits. A 
\ The paired-aasociate paradigm is a two- 

phase learning task involving both response 
] learning and relationship learning. From an 
Í information-processing perspective, this im- 
| plies that a subject would need to allocate 
| sufficient time to fixate, chunk, and then 
store the pairs of related information units 
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(first responses, then relations) through 
short-term memory into long-term memory. 
Because the items used here were so con- 
structed, the paired-associate task required 
no extensive search or comparison process. 
ing, and given that the response elements 
(letters) were thoroughly familiar, learner- 
controlled allocation of time (attention) for 
fixation and storage was a major task de- 
mand. In addition, paired-associate tasks 
are generally structured so that the subject 
has access to feedback, and this introduces 
another control-of-time-allocation variable. 
As described more fully below, both a self- 
paced and a fixed-rate, postresponse feed- 
back condition were created. In the self- 
paced condition, subjects had to make an 
explicit decision about how long to attend 
to the feedback information, whereas in the 
fixed-rate condition, subjects were presented 
the feedback for an externally controlled 
interval. 

The matching task required subjects to 
demonstrate their ability to perform per- 
ceptual differentiation (Gibson, 1969) 
among objects (nonsense words) whose fea- 
tures (letters) were themselves already 
highly learned. This called for scanning and 
comparing the always-present standard 
with five alternative responses, identifying 
the discriminative information locations 
(distinctive features), and selecting which 
one of the five response choices was exactly 
like the standard. This task could be ehar- 
acterized as primarily tapping a perform- 
ance, in contrast with tapping a learning 
ability. Given that a subject was thor- 
oughly familiar with the alphabet and had 
no experimenter-imposed constraints on re- 
sponse or study time, the major demands of 
this task were on the subject's contro! of the 
allocation of attention (time) to processes 
he/she already was competent in executing: 
accurate short-term memory fixation and 
successive scanning and comparing among 
six items, These operations of scanning for 
information (discriminating letters) and 
checking hypotheses about matches, by 
contrast with the paired-associate task, 
made no long-term memory demands, Each 
matching item was a separate event, and 
the matehing items required no response 
learning and no information from one item 
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to be remembered or carried forward for use 
on another item or at a later time. 

Undoubtedly, these tasks might be ana- 
lyzed in other terms and constructs, but the 
analysis of task demands above led to an 
interest in the learner characteristic of im- 
pulsivity-reflectivity, extensively studied 
principally by Kagan and his associates 
(e.g., Kagan, 1965; Kagan et al., 1964). In 
the present study, impulsivity—reflectivity 
was conceptualized as a learner characteris- 
tie which describes an individual's speed of 
cognitive information processing. As im- 
plied by the task analyses above, on the 
matching task, speed of processing and re- 
sponding can be especially crucial at the 
point of recognizing distinctive or discrimi- 
native features and in selecting and evalu- 
ating potential correct responses from the 
array of available responses. On the paired- 
associate task, this trait should influence 
the critical function of allocating sufficient 
time to item fixation and storage into long- 
term memory of both responses and rela- 
tions. 

In the present study, impulsives were 
not expected necessarily to perform more 
poorly (commit more errors) than reflec- 
tives, Kagan’s classification scheme (impul- 
sive = shorter than median latency and 
greater than median errors) was derived 
from data on performance tasks of unspeci- 
fied difficulty, Since number of errors is one 
index of difficulty, from the data Kagan has 
published it would appear that his tasks 
were roughly of moderate difficulty and, 
therefore, of moderate complexity. Consid- 
ering the parameters of the human orga- 
nism as an information-processing system, 
an individual attempting to process a mod- 
erately complex aggregate of information in 
a short time period could well be expected 
to make more errors. However, this does not 
inform us about the relative performance of 
the two types of individuals at other levels 
of task complexity. Thus, in this research 
both easy and difficult items were developed 
for both tasks. 


Hypotheses 


This research was designed to test the 
hypothesis that the learner characteristic of 
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impulsivity versus reflectivity would both 
generalize to and influence error rates on 
two basic, learning and performance para- 
digms (visual differentiation and associa- 
tion learning) ; that is, this learner charac- 
teristic was expected to generalize to more 
educationally relevant performance situa- | 
tions than the somewhat artificial tasks 
studied by Kagan and his associates. More 
specifically, the hypotheses were as follows: 

1. Impulsivity-reflectivity would interact 
with the task characteristic of item com- 
plexity such that (a) impulsives would per- 
form at least as well as reflectives on low- 
complexity (easy) items and (b) they 
would perform less well on. more complex 
(difficult) items. Less complex items involve 
or contain fewer “chunks” of information to 
be processed, and therefore a rapid respond- 
ing strategy should not degrade perform- 
ance on easy items as it would be expected 
to on more complex ones. The reflectives' 
longer deliberation should only improve his 
performance (relative to an impulsive sub- 
ject) on more complex or difficult items. On 
a less complex item, added deliberation 
should have no functional value, even pos- 
sibly resulting in debilitation through con- 
fusion or compulsive inability to decide on 
a response. 

2. Impulsivity-reflectivity would interact 
with the treatment variable of control over 
potential feedback such that (a) impulsives 
would not self-manage an opportunity to 
gain corrective feedback so as to optimize 
their performance as compared with other 
impulsives in a fixed-pace feedback condi- 
tion and (b) impulsives would make signif- 
icantly more errors than reflectives in a 
self-paced feedback condition. This effect 
was expected to be heightened on complex 
information items as compared with easy 
items. In general, impulsives were expected 
to “rush on” and fail to take advantage of 
available feedback when they had self-con- 
trol over the feedback interval. Therefore, 
they were expected to perform more poorly 
than reflectives who were expected to take 
greater advantage of such feedback when 
they were self-paced. Also, self-paced im- 
pulsives were expected to perform more 


poorly than impulsives exposed to fixed- 
"pace feedback.? E 


P METHOD 
Subjects 


A group of 64 second-grade children (24 boys 
and 40 girls) were administered the matching and 
paired-associate measures described below. These 
subjects were identified as the 32 more impulsive 
and 32 more reflective children from among the 
entire second-grade population (n = 90) of a pub- 
lic elementary school. (The complete testing of all 
90 was not possible due to teacher reactions to the 
multiple absences from class required by the pro- 
cedure.) The selection of the 64 was made on the 
basis of the individual’s mean response time across 
the 12 items of the Matching Familiar Figures 
Test, developed by Kagan et al. (1964). 


Stimulus Materials 


Two types of performance tasks were used: a 
matching-with-standard task and a paired-asso- 
ciate learning task. Individual items for the match- 
ing task were nonsense. words, that.is, three- or 
five-letter groups which did not have any interior 
vowels. The easier trigrams (n = 14) were con- 
structed from low-pronounceability trigrams. The 
more difficult five-letter (n = 14) words were com- 
posed. by putting together two of the low-pro- 
nounceability trigrams and dropping one letter. 
The difficulty level of the items was systematically 
varied according to four criteria: first, the num- 
ber of letters in the item (three or five letters) : 
second, the number of distinctive letters (noniden- 
tieal) in the standard as compared with the com- 
peting (incorrect) abternative responses (ranging 
from none to the maximum number of letters in 
the item); third, the shape of letters (whether 
ascending, descending, or line letters) as similar or 
contrasting in the discriminative positions of in- 
correct alternatives compared with the standard, 
correct response; fourth, the location in the word 


? The effect of this treatment variable was stud- 
ied only on the paired-associate task. The princi- 
pal reason was that treatment variables (in the 
three-dimensional aptitude-treatment interaction 
(ATI) scheme proposed by Rhetts, 1972) are of 
interest chiefly as instructional manipulations or 
conditions and in respect to their impact on learn- 
ing. As described above, the matching task was 
construed as primarily a performance task. Ex- 
perimentally manipulating access to feedback in 
the matching task would have changed its nature 
from a performance task to one involving learning; 
it would have introduced a memory demand vis- 
‘A-vis response learning before subjects could have 
begun information search and comparison process- 
ing. However, altering this visual differentiation 
task so that it involved a learning demand could 
well be a useful experimental manipulation to 
study in future research. 
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of the discrimine letter(s), that is, the first po- 
sition, last position, middle position, ete. [the first 
position is the first attended to by beginning read- 
ers, the last is next in order, and the middle (or 
interior) positions are the last to be attended to as 
discrimination or information locations; e.g., Wil- 
liams, Blumberg, & Williams, 1970]. 

The paired-associate items were composed of 
pairs of alphabetic letters and line drawings repre- 
senting fragments of the respective letters. The 
partially isomorphic fragments were the stimuli, 
and the letters were the responses. Five difficult 
and five easy items comprised a list. Item difficulty 
was based on the number of letters which made up 
the response to a given item (one versus two) and 
on whether the orientation of the stimulus figure 
was constant from trial to trial. The more difficult 
items required a two-letter response, and the orien- 
tation (though not the shape) of the stimulus var- 
ied from one trial to another. [Rotations of 90, 180, 
and 270 degrees from the upright (normal) position 
were used.] Thus, as part of the task for the diffi- 
cult items, a subject saw four orientations of each 
stimulus and had to equate them effectively into 
a single stimulus. On the easy items, a given stimu- 
lus always had the same orientation in space and 
required a one-letter response. Within each diffi- 
culty level, four random orders of the five pairs 
were constructed; these were used to control for 
serial-learning effects. 


Procedure 


All material in the matching and paired-associ- 
ate tasks was presented via slide projector onto a 
rear projection screen. On the matching task, the 
subject responded by pressing the one of five but- 
tons which was aligned with the alternative on the 
screen he selected as being correct. Each subject 
was instructed to make his choice as quickly as he 
could and yet get it right and to continue to make 
a choice on each item until he selected the correct 
alternative. If he made an incorrect choice, the 
same slide stayed in view. If he selected correctly, 
the next slide was presented on the screen. 

On the paired-associate task, after one initial 
exposure to the entire set of stimulus-response 
pairs on the screen, the subject was instructed to 
say aloud the name of the letter which “went with” 
the stimulus figure (when it appeared in view for 
five seconds) as soon as he could remember or 
thought that he could. Guessing was stated to be 
acceptable. The subject was immediately informed 
about the correctness of his answer both verbally 
(*right," “wrong”) and visually (green light, red 
light). After each response, the subject was shown 
the stimulus-response pair. Half the subjects (fixed 
feedback condition) saw this slide for a fixed, five- 
second interval, while the other subjects (self-paced 
feedback condition) could themselves determine 
how long they looked at the postresponse pair slide. 

It was reasoned that perhaps testing all subjects 
on easy items prior to the difficult ones (or vice 
versa) might generate a general warm-up effect on 
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the items presented second. Also, having the sub- 
jects uniformly perform one task prior to the other 
might introduce some systematic though unpre- 
dictable effect due to task order. Hence, for con- 
trol but not analysis purposes, subjects were ran- 
domly assigned to balanced groups with respect 
both to order of task presentation and order of 
items worked on first (easy or difficult), thereby 
tending to spread any such sequence (administra- 
tive) effects over all the treatment (experimental) 
effects (Winer, 1962). 


Resuits 
Analysis 
Main and interaction effects were exam- 


ined for the following independent varia- 
bles: 


1. Learner characteristic—impulsivity 
versus reflectivity ; 

2. Task characteristic—item difficulty 
(easy items versus difficult items; within- 
subjects, repeated-measure variable) ; 

3. Treatment variable—fixed versus self- 
paced postresponse feedback (for paired- 
associate task only) ; and 

4. Sex—boys versus girls (analyzed as a 
control variable). 


Separate analyses were done for each of 
three dependent variables: 


1. Number of errors, 

2. log. (response latency), and 

3. log, (postresponse, pair-slide examina- 
tion time) for self-paced subjects on the 
paired-associate task only. 


The number of errors was the dependent 
variable of primary interest; response and 
examination times were included in order to 
assist in interpreting and/or explaining in- 
teraction and main effects on the error 
scores. This yielded a total of five Separate 
analyses (two for the matching task and 
three for the paired-associate task). In each 
analysis, the within-subjects error term was 
used to test all main and interaction effects 
involving the repeated measures variable of 
item difficulty; the between-subjects error 
term was used for all other main and inter- 
action effects, 


Findings 


Matching task. The number of errors 
made by the impulsive and reflective sub- 
Jects on the easy and difficult items of the 
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matching task bears directly on one of the 
study’s hypotheses; namely, rapid process- 
ing and responding should not handicap im- 
pulsives on the easy items, but the reflec- 
tives should make fewer errors on the diffi- 
cult items. The information in Figure 1 
shows that the Learner Characteristic x 
Task Characteristic interaction conformed 
to the predicted pattern (F = 9.77, df = 1/ 
60, p < .003). It can be seen that, while all 
subjects made fewer errors on the easy 
items, the impulsives made fewer errors 
than the reflectives. And, although all sub- 
jects made substantially and significantly 
more errors on the difficult items, this in- 
crease was greater for the impulsives than 
the reflectives. Significant main effects were 
also found for item difficulty (F — 90.44, df 
— 1/60, p « .001, confirming the a priori 
classification of items as easy and difficult), 
for impulsivity-reflectivity (F = 5.82, df — 
1/60, p < .02, impulsives more errors over- 
all), and sex (F — 4.64, df — 1/60, p < .04, 
girls fewer errors overall). 

The data for log, (response time) on the 
matehing task paralleled those for num- 
ber of errors, Both impulsives and reflec- 
tives took significantly more time to re- 
spond to the difficult items than to the easy 
ones (F — 29440, df — 1/60, p « .001), 
and reflectives took significantly more re- 
sponse time overall than did the impulsives 
(F — 2418, df — 1/60, p « .001). There 
was a trend (F — 3.19, df — 1/60, p « .08) 
for impulsives, in contrast to reflectives, to 
increase their allocation of time to difficult 
items compared to easy ones. There was 
also a trend (F = 2.84, df = 1/60, p < -10) 
for girls to respond more quickly than boys. 

Paired-associate task. The largest influ- 
ence on paired-associate-task error rates 
was the difficulty level of the items; overall, 
the subjects made more than four times as 
many errors on the difficult items as they 
did on the easy ones (F = 200.74, df = 1/ 
56, p < .001, as with the matching task, 
clearly confirming the a priori categoriza- 
tion of items as easy and difficult). The 
treatment variable of fixed versus self- 
paced feedback was involved in a three- 
way interaction (F = 3.08, df = 1/56, p < 
08) with both sex and impulsivity—reflec- 
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Mean Number Of Errors 


Easy Difficult 


Item Difficulty 

Fiaure 1, Matching task mean number of er- 

rors for the Impulsivity-Reflectivity X Item Dif- 
ficulty interaction. 


tivity (see Figure 2). The markedly differ- 
ent performance patterns seen in Figure 2 
for impulsives compared with reflectives led 
to a nested analysis of sex by feedback 
within the learner characteristic factor (for 
impulsives, F — 5.87, df — 1/56, p « .02; 
for reflectives, F = .03, df = 1/56, p > 
.87). Although the means suggested that (as 
hypothesized) both impulsives and reflec- 
tives did more poorly under the self-paced 
feedback condition and that the detrimental 
effect was greater on impulsives than reflec- 
tives, the F ratios for feedback (F = .44) 
and learner characteristic by feedback (F 
= .08) were not significant. The hypothe- 
sized Learner Characteristic X Item Diffi- 
culty interaction did not materialize here as 
it did on the matching task (F = .08). 

An analysis of time allocated by “self- 
paced” subjects to getting feedback re- 
vealed the following: reflectives made sig- 
nificantly greater use (F = 9.47, df = 1/28, 
p < .005) of the feedback opportunity on 
both easy and difficult items; all subjects 
paused significantly longer (F = 29.48, df 
= 1/28, p < .001) on difficult (versus easy) 
items; and impulsives and  reflectives 
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equally increased their use of such an op- 
portunity on difficult items compared with 
easy items (F = .22, df = 1/28, p > .65). 
Parenthetically, over all paired-associate 
items, the self-paced subjects allocated an 
average of 4.79 seconds to the postresponse 
pair slide as compared with the experi- 
menter-established interval of 5.0 seconds 
for subjects in the fixed-pace condition. 

The response time data for the paired- 
associate task generally paralleled those for 
the matching task: reflectives took signifi- 
cantly longer (F = 4.96, df = 1/56, p < 
-03) to respond overall; all subjects signifi- 
cantly increased (F = 148.02, df = 1/56, p 
X .001) their response latency on difficult 
items; and reflectives did this somewhat 
more so (F = 3.23, df = 1/56, p < .08) 
than impulsives on the difficult items. 


Discussion 


Confirmation was found for each of the 
study’s hypotheses. The characteristic of 
speed of responding manifested itself on 


Mumber Of Errors 


Self-Paced 


Fixed 


Feedback Condition 


Ficure 2. Paired-associate task mean number 
of errors for the Impulsivity-Reflectivity X Sex x 
Feedback Condition interaction. (Abbreviations: 
I = impulsives, R = reflectives, B = boys, and 
G — girls.) 
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both tasks, with impulsives consistently 
and significantly showing shorter latencies 
on each task, both for input (feedback ex- 
amination) as well as output (response) ac- 
tivities. Thus, this learner-characteristic 
variable seems to be one which is not task 
specific. It was also associated with error 
rates on both tasks: (a) It interacted sig- 
nificantly (on the visual discrimination 
task) with the intratask characteristic of 
item diffieulty. (b) It was involved (on the 
paired-associate task) with both sex and 
the treatment variable of control over ac- 
cess to feedback in a three-way interaction 
which did not reach statistical significance. 

Overall, the point-biserial correlation be- 
tween subjects’ sex and the impulsivity—re- 
flectivity measure (the Matching Familiar 
Figures Test) was —.03 (n = 90). Stratify- 
ing the analyses by sex revealed that girls 
made fewer errors than boys on the match- 
ing task while at the same time responding 
more rapidly, Sex manifested no significant 
main effect on  paired-associate errors, 
though it did enter into a nearly significant 
(p « .08) three-way interaction with im- 
pulsivity-refleetivity and feedback condi- 
tion. However, further analysis revealed 
that while reflectives of both sexes re- 
sponded similarly under both feedback con- 
ditions, impulsive boys and girls reacted 
significantly differently to the alternative 
treatment conditions. Impulsive girls, un- 
like impulsive boys, performed better in the 
fixed-pace feedback condition. It is not 
clear why this should be 80, though one 
might speculate about influences such as 
sex-typed social reinforcement for girls’ 
conformity and boys’ rebelliousness. How- 
ever, a more precise interpretation requires 
further empirical elaboration. These find- 
ings reinforce Sigel's (1965) admonition 
that researchers should include sex as an 
explicit analytic factor. 

The specific hypothesis concerning impul- 
sives’ equal or superior performance on less 
complex items received direct confirmation 
on the matching task: impulsives made 
fewer errors than the reflectives on the easy 
items but made more errors than the reflec- 
tives on the difficult items. On the paired- 
associate task, the main effect, for impulsiv- 
ity-reflectivity was clearly nonsignificant, 
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as was the interaction with item difficulty. 
That is, on the paired-associate (unlike the 
matching) task, error rates of impulsives 
and reflectives increased proportionately on 
the difficult items. The most direct 
interpretation of this finding involves task 
characteristics: The difficult items were so 
difficult (two-letter response, greater stimu- 
lus complexity, and rotational orientation 
through 360°) that even the reflectives’ 
greater allocation of time to both respond- 
ing and feedback was not significantly 
more effective in reducing errors. The data 
does not suggest that the task was function- 
ally impossible—10 of the 64 subjects, 5 
reflectives and 5 impulsives, reached the 
criterion of two perfect trials in a maximum 
of eight total trials. Thus it might be rea- 
sonable to speculate that if the difficulty 
were somewhat reduced and/or the maxi- 
mum number of trials were extended, the 
reflectives’ greater time allocation might 
pay off in fewer errors. A more complex- 
interpretation might be sought in differ- 
ences among intertask characteristics (de- 
mands), but additional data would be re- 
quired to illuminate this line of interpreta- 
tion. 

Hypothesis 2, concerning impulsives' per- 
formance under different treatment condi- 
tions, received support. The means for all 
subjects under the two feedback conditions 
and for impulsives versus reflectives within 
each feedback condition were in the “right” 
direction; but despite the rather large mean 
differences, they cannot be interpreted as 
strongly confirming Hypothesis 2 since the 
small F ratios imply that the within-cell 
variances were relatively large. However, 
the triple interaction which also involved 
sex (Figure 2) clearly showed that, as pre- 
dicted, the treatment condition had marked 
impact on impulsives’ performance while 
not markedly influencing reflectives’ per- 
formance and that impulsives' performance 
differences were also a function of the sub- 
jects’ sex. 

Generally, there was a strong parallel be- 
tween number of errors and both response 
and postresponse feedback times: the 
longer the response latency or examination 
time, the fewer the errors. The only notable 
exception to this was that on the matching 
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task, girls made significantly fewer errors 
and responded somewhat more quickly. In 
light of this strong trend, an intrigu- 
ing question remains concerning the impul- 
sive-reflective learner variable itself: What 
causes and/or supports such a characteris- 
tic, generalized, cross-task trait? A suspi- 
cion (which this research was not designed 
to resolve) might be that impulsives are 
less aware of or concerned by the outcomes 
or consequences of their behavior. In Bru- 
ner's (1956) vocabulary, the operational 
strategy of the impulsive might be de- 
scribed as minimizing the level of cognitive 
strain, even at the expense of increasing the 
risk of failure (error). The reflective, on the 
other hand, appears to minimize the risk of 
failure at the expense of cognitive strain 
(time and effort spent). To use another 
metaphor, one can consider the human in- 
formation-processing organism as a closed- 
loop control system (e.g. Powers, 1973). 
When such a system is confronted with a 
task (an event in the external environment 
which is sensed and represented internally), 
it necessarily confronts both (a) the proba- 
bility (very high) of having to make a re- 
sponse and (b) the probability (uncertain) 
that the feedback received from the envi- 
ronment will trigger an internal error signal 
and require further responses to reachieve a 
stabilized state. The impulsive could then 
be described as functioning to minimize the 
delay in accomplishing the highly certain 
need to make a response; the reflective, by 
contrast, functions so as to minimize the 
uncertain chance of negative environmental 
feedback. 

In summary, this research demonstrates 
that learner, task, and treatment character- 
isties can combine in interaction with one 
another to produce complex performance 
differences. These findings, in turn, under- 
line the need for investigators interested in 
instruetional design to utilize an Attribute 
X Treatment interaction design. However, 
researchers cannot simply “throw together” 
combinations of the three variables. A task- 
first approach (as used here) and a two- 
stage design for data collection (Rhetts, 
1972) should improve significantly the yield 
from such research. The companion (sec- 
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ond) stage for this study would be the ex- 
ploration of treatment variables which 
could reduce the performance differences re- 
ported above. 
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TEACHER BEHAVIOR ACROSS ABILITY GROUPS: 
A CONSIDERATION OF THE MEDIATION OF PYGMALION EFFECTS 


JUDITH LANDON ALPERT* 
New York University 


Do teachers use more good behaviors with high- or low-reading-ability 
groups? Good teacher behaviors are defined as those teacher behaviors 
experts judged likely to increase pupil reading performance. Each 
of 15 second-grade Catholic school teachers’ top- and bottom-reading- 
group sessions were observed and were tape-recorded 3 times. A total 
of 90 reading group sessions were analyzed (15 Teachers X 2 Reading 
Groups X 3 Sessions = 90). The difference between good teacher be- 
haviors with the top and bottom reading group was compared by 
means of the correlated ¢ test. It was found that teachers generally 
treat the 2 reading groups the same with respect to the good behav- 
iors; that is, teachers treat the two ability groups similarly with 
respect to amount and quality of reading group time, number of 
reading group materials, and number of good verbal behaviors. 
Teachers did show preferential treatment to the bottom reading group 
by placing fewer pupils in that reading group. The findings indicated 
that teacher behavior may not be adversely affected by teacher 


expectation. 


Rosenthal and Jacobson (1968) hypothe- 
sized that teacher expectancies influence 
pupil growth in achievement. Serious meth- 
odological problems (Barber & Silver, 1968; 
Jensen, 1969; Snow, 1969; Thorndike, 1968; 
Thorndike, 1969) made their research highly 
questionable, and replication attempts 
(Claiborne, 1969; Kester, 1969; Meichen- 
baum, Bowers, & Ross, 1969) have yielded 
mixed results. 

Recently, attention has begun to focus on 
the events mediating teacher expectation 
and pupil performance, The general conclu- 
Sion of investigators citing differences in 
pupil ability as a differentiating factor in 
teacher-pupil interactions is that low-abil- 
ity pupils have fewer opportunities to re- 
spond in the classroom (Good, 1970; Rist, 
1970; Willis; 1970) and receive fewer posi- 
live and more negative teacher contacts 
(Brophy & Good, 1970; Good, 1970; Rist 
1970). Rasmussen’s (1961) and Schwarz 


* Requests for reprints should be sent to Judith 
Landon Alpert, Department of Educational Psy- 
chology, School of Education, New York Univer- 
b Washington Square, New York, New York 


(1967) findings, however, do not support the 
latter conclusion. 

One major criticism of most of the inves- 
tigations concerning teacher expectation and 
teacher behavior is that observers knew 
which pupils were of high and low ability. 
Good's (1970) attempt to control observer 
bias is an exception. Good did not inform 
observers about pupil ability. However, it is 
likely that pupil ability became apparent 
during classroom observation. The present 
study investigated the mediation of Pygma- 
lion effects and attempted to control ob- 
server bias. 


METHOD 
Subjects 


The research was carried out in 15 second-grade 
classes housed in 11 New York City Catholic 
schools which serve a middle-class population. 
Schools were selected in which second-grade pupils 
were grouped into classes randomly and into read- 
ing groups by ability. Fourteen of the 15 teachers 
were lay teachers. Twelve of the 15 teachers held 
bachelor’s degrees. 


Instrument 


Good teacher behaviors are defined as those 
teacher behaviors which experts judged likely to 
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increase pupil reading performance. The 39 experts 
who participated in developing the list included 3 
teachers of reading teachers, 15 remedial reading 
teachers, and 21 first-grade teachers. All experts 
had served in their position for at least two years. 
Since a purpose of the research was to determine 
if the teacher gives more good behavior to one 
ability group, an attempt was made to develop 
a list of good teacher behaviors for the top read- 
ing group and for the bottom reading group. How- 
ever, according to expert opinion, good teacher be- 
haviors do not differ for the two reading ability 
groups; that is, the experts identified a list of five 
teacher behaviors which they judged likely to in- 
crease both top- and bottom-reading-group pupil 
performance. The good behaviors are (a) more 
reading group time, (b) best reading group time, 
(c) more materials in reading group, (d) fewer 
pupils in reading group, and (e) more "good" ver- 
bal behaviors. There are 20 good verbal behaviors. 


Development 


The list was developed as follows? Three read- 
ing specialists familiar with the purpose of the 
study listed four teacher/nonverbal behaviors, each 
of which they judged to be objectively measurable 
and likely to increase pupil reading performance. 
The nonverbal behaviors were listed after discus- 
sion, by concensus, and were judged good for 
both reading groups. Responses to a questionnaire 
indicated that a minimum of 30 of the 36 reading 
specialists and teachers agreed that an increase in 
each of the four nonverbal behaviors was likely to 
increase the reading performance of pupils in high 
reading groups and in low reading groups. 

The procedure for listing good verbal behaviors 
was an adaptation of a method used by Ryans 
(1960) which was based on Flanagan's (1954) criti- 
cal incidents technique. This procedure involved 
the following: 36 reading specialists and teachers 
listed from a tape recording of a teacher who was 
teaching reading those teacher behaviors likely to 
inerease bottom-reading-group pupil performance 
and top-reading-group pupil performance. Approxi- 
mately half of the reading specialists and teachers 
were asked first to list good teaching for the top 
reading group. Reading specialists and teachers 
were also asked to justify their responses. The 20 
verbal behaviors were developed from those re- 
sponses listed by a minimum of 21 of the 36 tead- 
ing specialists and teachers which could be objec- 
tively defined. $ 

Responses indicated that good verbal behaviors 
were the same for both reading groups. Less than 
10 responses were listed as good for one reading 
group only. An attempt was made to validate the 
verbal items; that is, three reading specialists were 
asked to rank order tape recordings of four 
teachers. The two teachers ranked “less good” by 
all three specialists used relatively few, good ver- 


2 A more detailed account can be found in Alpert 
(1973). 


bal behaviors in comparison to the two teachers 
ranked “good.” Percentage agreement was at least 
76% between coders on three consecutive tape 
recordings of reading group sessions. On one con- 
sistency check during the four weeks of coding, the 
percentage agreement (76%) was maintained. 


Operational Definitions 


Reading group time was defined as that time 
period beginning when the teacher called the 
reading group together and ending when the 
teacher dismissed the reading group and was no 
longer working exclusively with any pupils in that 
reading group. Reading group time did not include 
time in which the teacher worked with pupils who 
were not members of the reading group in session. 
Best reading group time was defined as that one 
third of the morning school session and that one 
third of the afternoon school session in which the 
teacher reported she felt most motivated to teach. 
Reading group materials were defined as those ma- 
terials which the teacher used for instructional 
purposes during reading group time. Examples of 
reading group materials are as follows: blackboard, 
picture, slide, film, object, text, workbook, teacher- 
made cards, flasheards, and supplementary reading 
materials. Reading group pupils were defined as 
those pupils whose names appeared on the teacher’s 
top- or bottom-reading-group list at mid-year. Good 
verbal/reading group behaviors were defined as 
those 20 verbal behaviors listed in Table 2* which 
the teacher verbalized during reading group time. 


Procedure 


The classes were each visited 4 times over a 
four-week period. There was a total of 60 visits 
(15 teachers X 4 visits — 60 visits). The data col- 
lectors (15 undergraduate students), teachers, and 
principals were told that the purpose of the study 
was to learn more about the learning patterns of 
pupils of different abilities. The tape recording 
of reading group sessions enabled the observers to 
be naive. 

The purpose of the first visit was to habituate 
pupils and teachers to the presence of the observer 
and the tape recorder. The purpose of the next 
three visits was to collect data on teacher/nonver- 
bal behavior and to tape-record teacher/verbal 
behavior. 


Data Analysis 


Data for teacher/nonverbal behavior were coded 
from forms completed by data collectors. One ex- 
ception was the number of pupils, which was de- 
termined from the teacher's list at mid-year. Data 
for teacher/verbal behavior were coded by two 
coders from tape recordings. Observer bias was 
minimized by tape-recording sessions, numbering 
the recordings rather than identifying the reading 


* A more detailed account can be found in Alpert 
(1973). 
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group on the recording, and organizing them so 
that the two ability groups taught by one teacher 
were not coded consecutively. Also, the coders did 
not tape-record the sessions. 

Data for four of the five good behaviors (read- 
ing group time, best reading group time, reading 
group materials, and good verbal behaviors) were 
analyzed similarly as follows. Sums across the three 
sessions for each of the good behaviors and for each 
of the 20 good verbal behaviors were obtained for 
each teacher's top reading group and bottom read- 
ing group. The sums were divided by the number 
of teachers to obtain means for the two reading 
groups. The differences between these means were 
compared by way of the correlated ¢ test. The 
procedure used to analyze the data for the number 
of pupils differed in that means indicated the aver- 
age number of pupils for the two reading groups 
rather than the average number of pupils across 
three sessions for the two reading groups; that is, 
summing across sessions was inappropriate since 
the number of pupils per reading group was de- 
termined from the teacher's list at mid-year rather 
than from observations or tape recordings. 


REsuLTS 


The correlated t test results in Table 1 
indieate that significantly fewer pupils were 
placed in the bottom than in the top read- 
ing group. Only an average of 8.87 pupils 
were in the bottom reading group as com- 
pared with an average of 13.87 pupils in 
the top reading group. Although the direc- 
tion of difference between means for the 
four remaining good behaviors (amount of 
time, amount of best time, number of mate- 
rials, number of good verbal behaviors) in- 
dicates preferential treatment to the bottom 


reading group, none of these differences are 
significant. 


TABLE 1 
Means, STANDARD DEVIATIONS OF DIFFERENCE, 
AND CORRELATED t VALUES FOR DIFFERENCES 
IN Goop Teacher BEHAVIORS wirH Top 
AND Borrom READING Groups 
ee eee 
M for group Correlated ¢ 


Behavior 


Top |Bottom |,.52 of 


Amount of time 72.27) 79.47) 15.64 |-1. 
Amount of best time} 22.60] 27.13 34.86 E 
Number of materials| 6.13| 6.73| 2.13 |-1.09 
Number of pupils | 13.87] 8.87| 4.83 | 4.07* 
Number of good 115.87/143.00| 66.72 |—1.34 
verbal behaviors 


*p« 0t. 
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TABLE 2 


Means, STANDARD DEVIATIONS OF DIFFERENCE, 
AND CORRELATED ¢ VALUES FOR DIFFERENCES 
IN Goop VERBAL BEHAVIORS wiTH Top 
AND Borrom READING GROUPS 


M for group Correlated ¢ 
Verbal behavior 7 
Top [Bottom [u$ of t 
1. Praise reading 9.93/17.13 | 10.44 |—2.67 
2. Support 4.93]10.13 | 8.64 |—2.33 
3. Reinforce 21.60/30.90 | 23.50 |—1.53 
4. Encourage 7.90|12.80 | 8.17 |-2.12 
5. Praise behavior .53| .13 .83 | 1.87 
6. Sequence 1.80| 3.07 | 3.63 |—1.35 
7. Demonstration 1.33| 1.00 | 2.06 -62 
8. Experience 9.80110.73 | 5.44 | —.66 
9. Definition 13.27|11.27 | 2.78 | 2.78 
10. Comparison 4.47| 3.07 | 3.62 | 1.50 
11. Summarize 2.60| 4.07 | 2.59 |—2.20 
12. Specific question |18.27]19.70 | 14.14 | —.40 
13. Word family 3.47| 1.80 | 5.68 | 1.14 
14. Simplify reading | 2.33| 3.67 | 2.89 |-1.79 
15. Table of contents | .67| .33 .82| 1.00 
16. Resource 2.60| 2.42 | 3.98 .26 
17. Question all 7.87/11.60 | 8.05 |—1.79 
18. Concern .60| .53 | 1.62 Bu 
19. Teacher answer .80| .73 | 1.16 .28 
20. Encourage ques- .13| .47 .90 |—1.46 
tion 


Note. The .05 level of significance could not be 
used for each test since 20 ¢ tests were done. There- 
fore, a critical region (CR = 3.65) was determined 
from Bonferroni's ¢ statistic with simultaneous 
.05 significance level under 20 behaviors (Miller, 
1967). 


Table 2 presents the 20 good verbal be- 
haviors given to the top reading group and 
to the bottom reading group. Results pre- 
sented in this table indicate no significant 
difference in the amount of good behavior 
given to the two groups. In summary, the 
data in Tables 1 and 2 indicate that teachers 
in this study generally treated the two read- 
ing groups equally. The small amount of 
teacher preferential treatment found was 
directed to the bottom reading group. Since 
there were fewer pupils in the bottom read- 
ing group, each pupil in the bottom group 
received more good behaviors; that is, each 
pupil in the bottom group was given more 
time and more best time, and more good 
verbal behaviors were directed to each pupil 
inthat group. 
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Discusston 


The results indicate that teachers gen- 
erally treat the top and bottom reading 
groups the same with respect to the good 
behaviors. Several findings in the present 
study are in accordance with the findings of 
other investigations. The finding that teach- 
ers place significantly more pupils in the 
top than in the bottom reading group has 
been reported by Groff (1962). Second, Ras- 
mussen's (1961) and Schwarz’ (1967) find- 
ing that teachers do not give more sup- 
portive comments to high- or low-ability 
pupils also supports the present finding. 
Third, Brophy and Good (1970) present 
data which indicate that teachers spend as 
much time with pupils in the two reading 
ability groups. 1 

However, the conclusion of some investi- 
gators (Brophy & Good, 1970; Good, 1970; 
Rist, 1970; Rubovits & Maehr, 1971; Willis, 
1970) considering teacher instructional be- 
havior with pupils differing in ability is that 
high-ability pupils receive preferential treat- 
ment. Also, the results from the present 
study could be interpreted as indicating 
preferential treatment to the high-ability 
group; that is, Strong, McCullough, and 
Traxler (1967) and Spache and Spache 
(1969) contend that bottom-reading-group 
pupils need more good teaching than top- 
reading-group pupils. Since top-reading- 
group pupils in Grade 2 have basic reading 
skills, they are more capable of independent 
work than their less achieving peers. In that 
light, the general lack of preferential treat- 
ment could be interpreted as diseriminatory 
to slow readers. 

What is clear, however, is the discrepancy 
in results between previous investigations 
and the present one. Moreover, this discrep- 
ancy occurred despite a similarity in the 
process variables investigated; that is, some 
process variables claimed to be mediators 
of the self-fulfilling prophecy by previous 
investigators were also considered in the 
present study. Specifically, Brophy and 
Good (1970) found that teachers reinforce 
and demand quality performance more with 
the top reading group. Moreover, this oc- 
curred when differences in number of correct 
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responses and errors made by pupils were 
taken into account. Although the teacher be- 
haviors studied by Brophy and Good were 
defined and scored differently than in the 
present study, “reinforce quality perform- 
ance” is similar to a category used in the 
present study (Verbal Behavior 1, praise). 
Also, “demanding quality performance” is 
similar to a combination of two categories 
used in the present study, (Verbal Behavior 
2, support; Verbal Behavior 4, encourage). 
However, the results in the present study in- 
dicate no differential treatment between 
groups for any of these verbal behaviors. 
Moreover, when frequencies in Categories 2 
and 4 are combined, the difference between 
groups is significant at the .05 level, indi- 
cating more good behaviors directed to the 
bottom reading group. 

Also, Good (1970), Rist (1970), and 
Rubovits and Maehr (1971) consider praise, 
and Good (1970), Rist (1970), Willis 
(1970), and Rubovits and Maehr (1971) 
consider attention. In the present study, 
praise was directly measured (Verbal Be- 
havior 1, praise reading; Verbal Behavior 5, 
praise behavior), and attention may be con- 
sidered to be indirectly measured (Non- 
verbal Behavior 1, amount of time). The 
results in most of the studies considering 
praise and attention, unlike the present in- 
vestigation, indicate discriminatory teacher 
behavior to low-ability pupils. 

One explanation for the discrepancy in 
findings is that Catholic school teachers may 
be more responsive to the needs of slow 
learners. However, Catholic school teachers 
in the present investigation are similar to 
public school teachers with respect to lay 
status and education. Difference in control 
for observer bias is another explanation for 
the discrepancy in findings. In the present 
investigation, there was some control for ob- 
server bias in coding the verbal behaviors. 
As indicated, the sessions were tape 
recorded, the level of the reading group was 
not labeled on the recording, and the two 
ability groups taught by one teacher were 
not coded consecutively. Also, the coders did 
not tape-record the sessions. Moreover, it is 
doubtful that observer bias could influence 
the objective to code nonverbal behaviors. 
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Observer bias was probably, however, not 
completely controlled. To consider the con- 
trol for observer bias in the present study, 
two first-grade teachers were asked to 
identify the reading group recorded on ran- 
domly selected tapes. Although one teacher 
identified 6 out of 10 correctly and the other 
identified 3 out of 6 correctly, it is possible 
that identification would have improved 
over trials. 

The results of this study indicate a more 
tentative acceptance of the findings con- 
cerning the mediation of Pygmalion effects, 
a more sophisticated control of observer 
bias in future studies, and a more rigorous 
consideration of the needs of high- and low- 
ability pupils. This conclusion is further 
substantiated by the paucity of data con- 
cerning the relationship between the process 
variables and pupil performance. That is, 
the relationship between the process vari- 
ables and pupil performance has not been 
demonstrated by investigators considering 
the mediation of Pygmalion effects (Brophy 
& Good, 1970; Good, 1970; Rasmussen, 
1961; Rubovits & Maehr, 1971; Schwarz, 
1967; Willis, 1970) and is not indicated by 
a review of the literature on teacher behav- 
lor and student achievement (Rosenshine, 
1971). 

_ Also, the results indicate the need to con- 
sider reasons other than teacher discrimina- 
tory behavior for the relatively slow growth 
of low-ability pupils. An alternative ex- 
planation is that educators do not know how 
to instruct pupils of low ability. There is 
little empirieal data indicating appropriate 
pedagogical strategies for the reading ability 
groups. Moreover, Chall (1966) indicates 
that pupils of high and average ability are 
less affected by teaching methods than low- 
ability pupils. The development of the list 
of good teaching behavior afforded a pre- 
liminary delineation of appropriate peda- 
gogical strategies for reading ability groups. 
Nevertheless, the relationship between the 
good behaviors and pupil performance has 
not been demonstrated. Whether those 
teacher behaviors identified by experts or 
others not yet identified by them affect 
bottom-reading group pupil performance is 
an important area for future investigation. 


JUDITH LANDON ALPERT 
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While reading a prose passage, 80 college sophomores answered ques- 
tions after every two or every four paragraphs of text. Moreover, 
the subjects responded to meaningful-learning questions requiring sub- 
sumption of facts under given ideas, questions involving rote learn- 
ing of facts, questions demanding rote learning of ideas, or task- 
irrelevant questions. When questions occurred more frequently, mean- 
ingful-learning questions resulted in recall of relevant and incidental 
information that was equal to or greater than rote-learning-of-ideas 
or task-irrelevant questions. Only meaningful-learning questions were 
adversely affected by less frequent pacing. It is argued that the more 
thorough processing associated with meaningful-learning questions 
necessitated their relative closeness in text. Furthermore, it appears 
that different types of questions can influence the direction (forward 
versus backward) and kind of processing activities associated with 


them. 


Some investigators (Frase, 1968d; Mc- 
Conkie, Rayner, & Wilson, 1973; Rothkopf 
& Bisbicos, 1967; Watts & Anderson, 1971) 
of mathemagenie behavior in prose contexts 
have examined the effects on retention of 
various kinds of adjunct questions. In all of 
these experiments, different kinds of ques- 
tions were found to differentially influence 
the information readers obtained from the 
passages they had read. 

_ The main focus of the present investiga- 
tion is on a comparison between essentially 
two types of questions, These two types of 
questions reflect the distinction made by 
some psychologists (Ausubel, 1968; Di 
Vesta, 1972) among various levels of learn- 
ing such as that between rote, verbatim 
learning and higher order, meaningful 
learning. In rote learning, material appears 


* This study is based on a portio i 
author’s dissertation which is bres y R ihe 
Department of Educational Psychology, Penn- 
sylvania State University, in parital fulfillment of 
the requirements for the PhD degree. Francis J 
Di Vesta served as chairman of the dissertation 
committee. 

* Requests for reprints should be sent to John 
P: Rickards, Educational Psychology and Research 
Section, SCC-G #55, Purdue University, West 
Lafayette, Indiana 47907. 


to be learned in a relatively random and 
unordered fashion. More importantly, it is 
not highly coded, since the subject acquires 
what is to be learned is exactly the same 
form as it is presented to him. In the con- 
text of the present experiment, rote learning 
implies that each sentence is processed as a 
discrete unit with the subject making no 
attempt to interrelate the various sentences 
of the passage. It seemed to us that this 
kind of processing activity might be in- 
duced by interspersing questions in a text 
that required the subject to recall as liter- 
ally as possible a word or words represent- 
ing a fact or an idea contained in one sen- 
tence of the passage. The verbatim nature of 
the answer to the question combined with 
the attentional focus on one sentence should 
effect the desired result, rote learning. 

Meaningful learning, on the other hand, 
involves the organization of facts under 
given, higher level, related ideas (Ausubel, 
1968). Hence, we reasoned that questions 
requiring subjects to organize or subsume 
facts in a passage under superordinate ideas 
provided in the questions would be likely to 
produce meaningful learning of the ques- 
tioned material. 

Research employing "advance organiz- 
ers" or themes (Ausubel, 1960; Dooling & 
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Lachman, 1971) rather than questions as 
"orienting directions" (Frase, 1970) has 
demonstrated the superiority of meaningful 
learning over rote learning of prose mate- 
rial. In light of the above evidence, our 
primary hypothesis was that meaningful- 
learning postquestions would facilitate re- 
tention more than rote-learning postques- 
tions. 

Closely related to the use of questions for 
controlling mathemagenie behavior is the 
frequency with which they are employed. It 
has been theorized (Rothkopf, 1963) that 
postquestions serve, at least in part, to rein- 
force appropriate mathemagenic activity. 
Moreover, it is nearly axiomatic that fre- 
quent reinforcement fosters the develop- 
ment of the reinforced behavior. Thus, we 
predieted that the frequency of the ques- 
tions would be directly related to perform- 
ance. This hypothesis has been confirmed in 
studies by Frase (19682, 1968c). 


MzrHoD 


Subjects 


The subjects were 80 college sophomores en- 
rolled in an introductory educational psychology 
course. Although participation in the experiment 
was voluntary, subjects did receive credit toward 
their course grade for such participation. Each 
subject was assigned to one of the conditions (n — 
10) by reference to a table of random numbers. 


Materials 


The experimental materials were a modification 
of those used by Bruning (1970) and consisted of 
an 800-word passage describing the characteristics 
of a fictitious African nation, “Mala.” The various 
paragraphs of the passage were unrelated to one 
another to the extent that each paragraph ad- 
dressed a different aspect of the geography, econ- 
omy, government, history, or social conditions of 
the fictitious nation. Each paragraph consisted of 
a topic sentence (general idea) followed by three 
related subordinate sentences (specific facts). In 
accordance with Bruning’s (1970) procedure, the 
allowable relationships between  superordinate 
(topic) and subordinate sentences were as follows: 


(a) the subordinate statement restated the re- 
lationship of the response term to the stem 
portion of the superordinate sentence in more 
specific and explicit form, (b) the subordinate 
statement was a specific reason for the relation- 
ship between the response term and the stem 
portion of the superordinate statement, or (c) 
the subordinate statement revealed a particular 
characteristic of the response term in the super- 
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ordinate statement, in some way typifying or 
E the nature of the response term itself. 
p. 188]. 


The 800-word passage consisted of eight text 
segments, each of which contained two 50-word 
paragraphs. For purposes of illustration, one text 
segment follows: 


The southern area of Mala can best be described 
as a desert. Rainfall is less than 2 inches per 
year in southern Mala. The soils in the southern 
area of Mala are either rocky or sandy. In the 
summertime temperatures have been recorded as 
high as 135 degrees in southern Mala. 


The history of Mala has been marked by ex- 
ploitation. The first slaves were forcefully taken 
from Mala to Europe in 1610. When Europeans 
came over to Mala to settle there, they never 
paid the Malans for the land they occupied. 
Prior to the coming of the Europeans, Arab 
nomads frequently plundered villages in Mala. 


One paragraph from each text segment was al- 
ways the target for the various types of interspersed 
questions employed. Such questioned paragraphs 
were referred to as related paragraphs; those not 
questioned were called unrelated paragraphs. Three 
types of experimental questions were constructed 
for each of the eight related paragraphs. The first 
type of question, rote-learning-of-facts questions, 
were directed toward specific facts in the subordi- 
nate sentences of the related paragraphs. Literal, 
verbatim recall of this information was all that 
was required to answer rote-learning-of-facts 
questions. Referring to the text segment illustrated 
above, a rote-learning-of-facts question was *How 
many inches of rain fall per year in southern 
Mala?" 

A second type of question, rote-learning-of-ideas 
questions, were aimed at ideas represented in the 
superordinate (topic) sentences of the related 
paragraphs. Like the rote-learning-of-facts ques- 
tions, rote-learning-of-ideas questions only required 
the subjects to literally recall passage material. The 
rote-learning-of-ideas question appropriate for the 
related paragraph of the above text segment is 
“What geographical term best describes southern 
Mala?” 

The third type of experimental question, mean- 
ingful-learning questions, required the subjects to 
organize or subsume the facts of the subordinate 
sentences under the ideas contained in the super- 
ordinate topic sentences of the related paragraphs. 
Using the text segment illustrated above, the 
meaningful-learning question was “Why can it be 
said that southern Mala is a desert?” In addition 
to these three types of experimental questions, 
control group questions were developed which were 
totally irrelevant to the passage. Such questions, 
which were termed task-irrelevant questions, were 
devised to control for whatever effect (e.g., in- 
creasing attention) the mere answering of ques- 
tions might have and were totally unrelated to the 
passage so as to prevent interference, 
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A booklet was prepared for each subject con- 
taining one text segment per 5 X 8 inch page 
with separate sheets of paper containing one or 
two test questions per page. 


Design 


Each subject responded to one of the four types 
of questions explicated above, rote-learning-of- 
facts, rote-learning-of-ideas, meaningful-learning, 
or task-irrelevant questions. In addition, the fre- 
quency of the questions varied such that half of 
the subjects received one question after every two 
paragraphs of text, while the other half received 
two questions after every four paragraphs of text. 
All subjects were given the same number of ques- 
tions; only the frequency of the questions was al- 
lowed to vary. Thus, the two between-subjects 
factors of the design for this study were the level 
of question variable (meaningful-learning, rote- 
learning-of-ideas, rote-learning-of-facts and task- 
irrelevant questions) and the frequency of ques- 
tion variable (after two paragraphs and after four 
paragraphs). 

At the conclusion of the experiment, a free-re- 
call test was administered. Regardless of the partic- 
ular condition in which a subject participated, each 
protocol was scored for each of the following de- 
pendent variables: 


1, Relevant facts—facts in the related para- 
graphs questioned by rote-learning-of-facts ques- 
tions; 

2. Relevant ideas—ideas in the topic sentences 
of the related paragraphs questioned by rote- 
learning-of-ideas questions; 

3. Relevant subsumed facts—facts in the re- 
lated paragraphs other than the relevant facts 
about which the subject was questioned in 
the meaningful-learning question condition. In 
actuality, meaningful-learning questions directed 
the subject toward all the facts (including 
relevant facts) in the related paragraphs; 

4. Incidental facts—facts in the unrelated 
paragraphs, that is, paragraphs about which there 
were no questions; and, 

, 5. Incidental ideas—ideas in the topic sen- 
tences of the unrelated paragraphs. 


Procedure 


Instructions informed the subjects that the pur- 
pose of the experiment was to examine the effects 
on learning of asking questions during the reading 
of written materials. The instructions also indicated 
that in answering the questions the subjects would 
neither be allowed to take any notes nor to turn 
back to a page once it had been read. The sub- 
jects were given 30 seconds to read each page of 
the passage and 10 seconds to answer each ac- 
companying question. At the conclusion of the 
experiment, the subjects were given 15 minutes 
to write down what they could recall about the 
country of Mala. The entire experiment including 
the free-recall test required about 28 minutes to 
administer. 
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As suggested earlier, each sentence of the 
passage consisted of either a fact or an idea. For 
scoring purposes, each sentence was reduced to a 
phrase of three to four words. This procedure is 
based in part on a method developed by Cofer 
(1941) and more recently used by Howe (1970). 
For example, one of the text sentences reads “In 
the summertime temperatures have been recorded 
as high as 135 degrees in southern Mala,” and the 
reduction reads “temperatures 135 degrees in 
southern Mala.” Moreover, alterations in phrasing 
and synonyms were considered correct if they did 
not change the meaning of a given sentence. An 
acceptable semantic equivalent of the text sen- 
tence given above might be, “It is very hot in 
southern Mala.” 

In order to determine the objectivity of the 
scoring procedure, two raters judged both the 
number of ideas and the number of facts generated 
in each of 45 randomly selected free-recall proto- 
cols, The Pearson product-moment correlation for 
the total number of facts recalled was .96; for the 
total number of ideas recalled by the subjects, the 
product-moment correlation was .95. These data 
provide evidence that the measures were adequately 
objective for the purposes of the present experi- 
ment. 


REsuLTS 


In this experiment, there were three mu- 
tually exclusive measures of relevant infor- 
mation: relevant facts, relevant subsumed 
facts, and relevant ideas. Further, there 
were two mutually exclusive measures of 
incidental information—incidental facts 
and incidental ideas. Accordingly, six sepa- 
rate analyses of variance were done—one 
for each of the above dependent variables 
and one for the total amount of recall. Ta- 
ble 1 presents the means for each of the 
eight conditions of the experiment for each 
of the dependent variables employed. 


Total Recall 


The effect of the various types of ques- 
tions on total recall of information in the 
passage was significant (F = 18.12, df = 3/ 
72, p < .01) as was the effect due to ques- 
tion frequency (F = 9.39, df = 3/72, p < 
01). However, these two variables—ques- ~ 
tion type and question frequency—inter- 
acted significantly (F = 7.81, df = 3/72, p 
< .01). 

For more frequent questions (after every 
two paragraphs), the analyses (Newman- 
Keuls) demonstrated that meaningful- 
learning questions produced significantly (p 
X .01) more total recall than any other 
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TABLE 1 
Means or THE ErGHT CONDITIONS OF THE EXPERIMENT FOR EACH 
DEPENDENT VARIABLE 
Frequency X Level of Question 
Dependent measure After two paragraphs After four paragraphs 
ML RLI RLF I ML RLI RLF I 

Total recall 34.90 27.00 20.90 15.00 20.30 24.40 21.90 15.30 
Relevant facts 4.70 2.60 3.20 .90 2.50 3.20 4.00 1.20 
Relevant subsumed 

facts 10.40 8.90 6.00 3.90 6.40 7.20 7.00 4.20 
Relevant ideas 5.70 5.10 2.60 2.20 4.10 4.90 2.50 2.20 
Incidental facts 10.00 6.60 6.90 4.90 5.30 5.80 5.80 4.90 
Incidental ideas 4.10 3.80 2.20 3.10 2.00 3.30 2.60 2.80 


Note. Abbreviations: ML = meaningful learning, RLI = rote learning of ideas, RLF = rote learn- 


` ing of facts, and I = task irrelevant. 


type question used in this study. Moreover, 
rote-learning-of-ideas questions yielded 
greater (p « .05) total recall than either 
rote-learning-of-facts or  task-irrelevant 
questions which, in turn, did not differ from 
each other, When questions occurred after 
every four paragraphs, however, only those 
receiving rote-learning-of-ideas and rote- 
learning-of-facts questions performed sig- 
nificantly (p < .05) better than the control 
(task-irrelevant) question group. 

And finally, question frequency (after 
two or after four paragraphs) only had a 
significant (p < .01) influence on meaning- 
ful-learning questions. 


Relevant Facts 


Although the effect of the various types 
of questions on the recall of relevant facts 
was significant (F = 15.06, df = 3/72, p < 
.01), the results must be qualified by the 
significant interaction between the types of 
questions and their frequency (F = 5.08, df 
= 3/72,» < 01). 

Simple effects of the types of questions 
for each level of question frequency (New- 
^ man-Keuls procedure) indicated that all 
task-relevant questions (rote-learning-of- 
facts, rote-learning-of-ideas, and meaning- 
ful-learning questions) led to significantly 
(p < .05) better recall than the control 
(task-irrelevant) questions, regardless of 
question frequency. Among the contrasts 
for the three experimental question types, a 
significant (p < .01) difference was found 


only between meaningful-learning and rote- 
learning-of-ideas questions, which differed 
only when questions were presented more 
frequently (after every two paragraphs). 
"Thus, subjects responding to factual ques- 
tions did not recall any more of the facts 
for which they were specifically questioned 
than did subjects receiving either type of 
general idea question. 

Comparisons (f tests) of the simple ef- 
fects of question frequency for each type of 
question revealed that only the difference 
between meaningful-learning questions pre- 
sented after two paragraphs and after four 
paragraphs of text was significant (t — 
2.99, p < .01). Thus, only subjects respond- 
ing to meaningful-learning questions per- 
formed better when the questions occurred 
more frequently. 


Relevant Ideas 


In the analysis of relevant ideas, only the 
effect due to the types of questions was 
significant (F = 23.14, df = 3/72, p < .01). 
Using the Newman-Keuls procedure, both 
the effects of meaningful-learning questions 
(X = 4.90) and rote-learning-of-ideas ques- 
tions (X = 5,00) were found to be signifi- 
cantly (p < .01) greater than rote-learn- 
ing-of-facts questions (X = 2.55) or task- 
irrelevant questions (X = 2.20). No other 
mean differences were significant (p > .05) 
in this analysis. These results imply that 
questions requiring either meaningful learn- 
ing or rote learning of ideas lead to greater 
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recall of the questioned ideas than questions 
requiring common knowledge or rote learn- 
ing of facts. Further, with these question 
types, frequency does not appear to be an 
influential factor in the recall of the ques- 
tioned ideas. 


Relevant Subsumed Facts 


The effect of the various types of ques- 

` tions on the recall of relevant subsumed 

facts was significant (F — 11.44, df — 3/72, 

p « .01). However, this variable interacted 

with question frequency (F — 3.68, df — 3/ 
72, p < 01). 

For questions occurring more frequently 
(after every two parapraphs), analysis us- 
ing the Newman-Keuls procedure revealed 
that idea-oriented questions (meaningful 
learning and rote learning of ideas) were 
significantly (p < .05) superior to questions 
demanding rote learning of facts or ques- 
tions testing common knowledge (task- 
irrelevant questions). In this case, superior- 
ity was measured in terms of the number of 
facts recalled that were tapped only by the 
questions (meaningful learning) requiring 
subsumption of those facts. Thus, focusing 
attention on a superordinate idea in a 
paragraph rather than on one of the sub- 
ordinate facts yields greater recall of the 
remaining facts in a paragraph. 

The only significant simple effect due to 
question frequency was that for meaning- 
ful-learning questions (t = 3.42, p < .01). 
Comparable to the results presented for rel- 
evant facts, meaningful-learning questions 
were more effective when questions ap- 
peared relatively close together; rote-learn- 
ing questions, on the other hand, were unaf- 
fected by question frequency. 


Incidental Facts 


Analysis of the data for recall of inciden- 
tal facts revealed effects for question type 
(F = 3.58, df = 3/72, p < 05), question 
frequency (F = 7.71, df = 1/72, p < .01) 
and the Type of Question x Frequency in: 
teraction (F = 3.08, df = 3/72, p < .05). 

For questions following every two para- 
graphs of the passage, Newman-Keuls com- 
parisons indicated that meaningful-learning 
questions were significantly (p < .05) 
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greater in the recall of incidental facts than 
rote-learning-of-ideas, rote-learning-of- 
facts, or task-irrelevant questions, which 
did not differ significantly from each other. 
No differences (p > .05) among contrasts 
of treatment means were found when ques- 
tions were used after every four paragraphs 
of text. The analyses of the two levels of 
question frequency for each of the four 
types of questions yielded only a significant 
simple effect due to frequency for meaning- 
ful-learning questions (t = 3.96, p < .001). 
Again, these data indicate that the effect of 
questions requiring meaningful learning of 
material is apparent only when such ques- 
tions occur relatively frequently. 


Incidental Ideas 


The analysis of the scores based on the 
recall of incidental ideas demonstrated an 
effect due to the frequency of the questions 
(F = 4.29, df = 1/72, p < .05) and a Type 
of Question x Frequency interaction (F = 
3.06, df = 3/72, p < .05). ' 

The analyses of simple effects via the ^ 
Newman-Keuls procedure demonstrated 
that when questions were used more fre- 
quently, both meaningful-learning and 
rote-learning-of-ideas questions led to sig- 
nificantly (p < .05) better recall than that 
of the rote-learning-of-facts questions but 
not of the task-irrelevant questions. No 
other mean differences were significant for 
either level of question frequency. And 
again, the only significant simple effect due 
to question frequency was that for mean- 
ingful-learning questions (t = 3.48, p < 
.001). 


Correlational Analyses 


Correlations among the five dependent 
variables were computed for each of the 
conditions employing different types of ques- 
tions presented after every two paragraphs 
of text. These four conditions (Meaningful 
Learning 2, Rote Learning of Ideas 2, Rote 
Learning of Facts 2, and Task Irrelevant 2) 
consistently yielded different amounts of 
recall as demonstrated in all of the above 
analyses. We reasoned that correlational 
analyses of the sort mentioned above might 
aid us in our interpretation of these effects. 
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The results revealed two significant cor- 
relations for Rote-Learning-of-Ideas 2 
questions and four significant correlations 
for Meaningful-Learning 2 questions. No 
significant correlations among the depend- 
ent variables were found for Rote-Learn- 
ing-of-Facts 2 or Task-Irrelevant 2 ques- 
tions. 

Regarding Rote-Learning-of-Ideas 2 
questions, a significant correlation (r = .67, 
p < .05) was observed between relevant 
ideas and incidental ideas, thereby suggest- 
ing that Rote-Learning-of-Ideas 2 ques- 
tions may have engendered a set to learn 
ideas. Among subjects given Rote-Learning- 
of-Ideas 2 questions, there was also a tend- 
ency to recall together facts from the re- 
lated paragraphs (r = .70, p < .05, between 
relevant facts and relevant subsumed 
facts). Taken together, these correlations 
suggest that  Rote-Learning-of-Ideas 2 
questions produced a tendency for faets to 
become associated with other facts (at least 
in related paragraphs) and ideas to become 
associated with other ideas, but not a tend- 
ency for facts to become associated with 
ideas, 

While this latter-mentioned tendency did 
not occur with Rote-Learning-of-Ideas 2 
questions or with any other type of ques- 
tion, it was manifest to a significant degree 
with Meaningful-Learning 2 questions (r = 
.74, p < .01, between relevant ideas and 
relevant facts; r = .62, p « .05, for rele- 
vant ideas and relevant subsumed facts; r 
= .74, p < .01, between incidental ideas 
and incidental facts). These data strongly 
suggest that Meaningful-Learning 2 ques- 
tions engendered subsumption, the process 
whereby facts are related to ideas. 

The final correlation (r = .64, p < .05, 
between relevant facts and incidental facts) 
implies that if subjects given Meaningful- 
Learning 2 questions recalled relevant facts, 
they also tended to recall incidental facts. 


Recall in Related and Unrelated 
Paragraphs 

A separate analysis of variance was per- 
formed for Meaningful-Learning 2, Rote- 
Learning-of-Ideas 2, and Task-Irrelevant 2 
questions using one within-subjects factor 
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consisting of the following four levels: (a) 
relevant ideas, (b) total related facts (rele- 
vant facts and relevant subsumed facts 
combined), (c) incidental ideas, and (d) 
incidental facts. Since the total number of 
facts from the related and the unrelated 
paragraphs was slightly different (29 for 
related, 26 for unrelated), the data were 
converted into percentages for these analy- 


ses. 

All three questions types (Meaningful 
Learning 2, Rote Learning of Ideas 2, and 
Task Irrelevant 2) yielded a significant ef- 
fect due to the repeated measures (F = 
13.66, p < 01; F = 15.69, p < 01; F = 
5.86, p < .01, respectively). 

Further analyses (Newman-Keuls proce- 
dure) demonstrated that while the Task- 
Irrelevant 2 questions did not differ signifi- 
cantly in the recall of information (facts or 
ideas) from the related or the unrelated 
paragraphs, both the Meaningful-Learning 
2 and the Rote-Learning-of-Ideas 2 ques- 
tions did produce significantly (p < .05) 
more information (either of facts or ideas) 
from the related paragraphs than from the 
unrelated paragraphs. These data suggest 
that subjects given either Meaningful- 
Learning 2 or Rote-Learning-of-Ideas 2 
questions reviewed the material for answers 
to the questions posed (see Discussion sec- 
tion). 


DISCUSSION 
Forward- and Backward-Processing Effects 


In concert with previous investigators 
(e.g., Rothkopf & Bisbicos, 1967), the pres- 
ent results clearly indicate that text-rele- 
vant postquestions can produce a substan- 
tial amount of recall of information that is 
both questioned and not questioned. In the 
present study, such effects were found with 
both meaningful-learning and rote-learn- 
ing-of-ideas questions when placed after 
every two paragraphs of text (Meaningful 
Learning 2 and Rote Learning of Ideas 2, 
respectively). What processing activities 
produced these effects? 

Frase (1967) originally proposed that the 
learner activities associated with adjunct 
postquestions involved either a forward 
process (shaping or eliciting appropriate 
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reading skills) or a backward process (re- 
viewing of previously read material for an- 
swers to questions posed). More recently, 
Frase (1968b) has indicated that both for- 
ward and backward processing probably 
occur with adjunct postquestions. There is 
some evidence (McGaw & Grotelueschen, 
1972) to support this contention at least 
with respect to verbatim postquestions (of 
the fill-in type, demanding literal recall of 
facts). Boyd’s (1973) data, however, sug- 
gest that verbatim postquestions similarly 
affect retention and rehearsal (backward 
process) but not attention (forward proc- 
ess). 

The separation in this experiment of text 
segments into related (questioned) and un- 
related (nonquestioned) paragraphs al- 
lowed a distinction to be made between for- 
ward and backward effects. 

A forward effect for the experimental 
questions would result in more recall than 
control (task-irrelevant). questions from 
both the related and the unrelated para- 
graphs. By operating in a forward manner, 
the experimental postquestions would be ir- 
relevant to both paragraphs in text seg- 
ments following them, and so any treatment 
effects due to these questions would be 
equally present in both kinds of para- 
graphs. On the other hand, because task- 
irrelevant questions are substantively irrel- 
evant to all paragraphs in the passage, they 

‘should result in equally low performance on 
the recall of material from either type of 
paragraph. A backward effect for the exper- 
imental postquestions would result in more 
recall than the task-irrelevant questions in 
the related paragraphs but not necessarily 
in the unrelated paragraphs. In this case 
the experimental postquestions would en- 
gender review of previously read material 
including the material actually questioned 
(specific backward effect) and perhaps in- 
formation adjacent to and/or thematically 
related to the material questioned (general 
backward effect). 

Since Rote-Learning-of-Facts 2 questions 
only produced significantly more questioned 
facts (relevant facts) than the control 
questions, we can conclude that only a spe- 
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cific backward effect was produced by 
Rote-Learning-of-Facts 2 questions. 

Meaningful-Learning 2 questions, how- 
ever, produced more recall than task-irrele- 
vant questions in both the related and the 
unrelated paragraphs. Evidently, Meaning- 
ful-Learning 2 questions influenced process- 
ing behaviors related to attention (forward 
effect). The high correlation for Meaning- 
ful-Learning 2 questions between recall of 
facts and ideas from both types of para- 
graphs (r = .66, for related paragraphs; r 
= .74, for unrelated paragraphs) strongly 
suggests that the type of attentional process 
present was a set to subsume facts under 
given ideas. 


Moreover, there is evidence in this study ' 


which seems to indicate that Meaningful- 
Learning 2 questions also produced a back- 
ward effect. Substantially more facts and 
ideas were recalled in the related than in 
the unrelated paragraphs with Meaningful- 
Learning 2 questions but not with task-ir- 
relevant questions. It appears, therefore, 
that Meaningful-Learning 2 questions en- 
gendered both a forward effect (set to sub- 
sume) and a backward effect (review of 
questioned material). 

Relative to control questions, Rote- 
Learning-of-Ideas 2 questions yielded more 
recall of facts and ideas in the related para- 
graphs but not in the unrelated paragraphs. 
Also, significantly more information (facts 
or ideas) was recalled in the related than in 
the unrelated paragraphs. Thus, we can as- 
sume that a backward effect was produced 
by Rote-Learning-of-Ideas 2 questions. In 
this regard, it is interesting to note that 
while facts in the related paragraphs tended 
to be recalled together (r = .70), ideas and 
facts in these paragraphs were not so re- 
called (r = .30). In view of the above cor- 
relational data, it is quite unlikely that the 
backward process involved relating facts to 
ideas, as in subsumption. Perhaps, then, in 
the course of answering questions about the 
ideas in the related paragraphs, subjects 
given -Rote-Learning-of-Ideas 2 questions 
reviewed or rehearsed facts in the para- 
graphs containing the questioned ideas. By 
dint of this review, these rehearsed facts 
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tended to become associated together at re- 
call. 

There is also some evidence of a forward 
effect operating with Rote-Learning-of- 
Ideas 2 questions. Significantly more ideas 
in the related paragraphs were recalled in 
the Rote-Learning-of-Ideas 2 than in the 
Task-Irrelevant 2 question condition. While 
not significant (p > .05), Rote-Learning- 
of-Ideas 2 questions tended to produce more 
recall of ideas in the unrelated paragraphs 
as well (Rote Learning of Ideas 2 — 3.8; 
Task Irrelevant 2 — 3.1). Furthermore, 
with Rote-Learning-of-Ideas 2 questions, 
there was a significant relationship between 
recall of ideas in the related and the unre- 
lated paragraphs (r — .70 for Rote Learn- 
ing of Ideas 2; r — .03 for Task Irrelevant 
2). It seems that we might cautiously inter- 
pret these findings to mean that Rote- 
Learning-of-Ideas 2 questions produced a 
set to learn ideas (a forward effect). 

In comparing Meaningful-Learning 2 and 
Rote-Learning-of-Ideas 2 questions, it is 
interesting to note that while both types of 
questions yielded the same number of ideas 
from either the related or the unrelated 
paragraphs, Meaningful-Learning 2 ques- 
tions were signifieantly superior to Rote- 
Learning-of-Ideas 2 questions in terms of 
the recall of facts from either type of para- 
graph. As mentioned previously, with 
Meaningful-Learning 2 questions, ideas and 
facts in either the related or the unrelated 
paragraphs tended significantly to be re- 
called together (r — .66 and r — .74, re- 
spectively), This tendency was not ob- 
served with Rote-Learning-of-Ideas 2 ques- 
tions (r = .30 and r = —.03, respectively). 
The above evidence seems to suggest that 
Meaningful-Learning 2 questions engen- 
dered a subsumptive process, whereas no 
such processing occurred in the case of 
Rote-Learning-of-Ideas 2 questions. Thus, 
even though subjects provided with Mean- 
ingful-Learning 2 and Rote-Learning-of- 
Ideas 2 questions had the same number of 
ideas available at recall, the prior subsump- 
tion of the material associated with Mean- 
ingful-Learning 2 questions greatly facili- 
tated the recall of the more detailed factual 
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material. Using lists of words, Tulving and 
Pearlstone (1966) found results analogous 
to those above in which a prose passage was 
employed. More specifically, they found 
that superordinate category names (poten- 
tial subsumers) were more effective in pro- 
ducing the subordinate words when present 
both during learning (presumably engen- 
dering a subsumptive process) and recall 
than at recall alone. 

In summary, it appears that different 
types of adjunct postquestions produced 
processing behaviors that varied in both di- 
rection and kind. Rote-Learning-of-Facts 2 
questions only produced a specific review (a 
specific backward effect) of the material 
questioned. Rote-Learning-of-Ideas 2 ques- 
tions, however, were evidently responsible 
for a general review (general backward ef- 
fect) of material thematically related and 
adjacent to the ideas questioned. There is 
also some evidence that Rote-Learning-of- 
Ideas 2 questions produced a set to learn 
ideas (a forward effect). Finally, Meaning- 
ful-Learning 2 questions apparently engen- 
dered both a set to subsume (forward ef- 
fect) and a review (backward effect) proc- 
ess. 


Frequency of Different Type Questions 


Unlike rote-learning questions (rote 
learning of facts and rote learning of ideas), 
meaningful-learning questions were influ- 
enced by question frequency. Also, only 
Meaningful-Learning 2 questions produced 
a significant amount of recall of material 
from both the related and the unrelated 
paragraphs. Apparently, these questions in- 
duced more thorough processing of the 
passage than did either of the rote-learn- 
ing questions. Thus, it seems that placing 
meaningful-learning questions relatively far 
apart may have overtaxed the subjects’ 
processing capacity, that is, produced ex- 
cessive “cognitive strain” (Bruner, Good- 
now, & Austin, 1956), thereby eliminating 
their advantage over the other types of 
questions. On the other hand, placing ver- 
batim-learning questions close together or 
for apart resulted in very little difference 
in cognitive strain, since the processing be- 
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haviors associated with these question types 
were less extensive. 


Educational Implications 


Questions were effectively used in this ex- 
periment to induce meaningful learning, 
thereby supporting and extending the work 
of Ausubel and his colleagues (see Ausubel, 
1968) who have employed another form of 
orienting direction, advance organizers, for 
this purpose. Furthermore, the present 
study supports and extends the findings of 
research conducted by Rothkopf and Bis- 
bicos (1967) and Frase (1968c). More spe- 
cifically, adjunct postquestions were found 
to influence incidental learning. Further- 
more, the results imply that meaningful 
learning questions are superior to factual 
(rote-learning-of-facts) questions, even 
when the amount of time spent reading the 
passage is the same for subjects responding 
to either type of question. In order to be ef- 
fective, however, meaningful-learning ques- 
tions must be spaced relatively close to- 
gether, so as to minimize cognitive strain. In 
certain situations, such as the one found in 
the present study as well as in Boyd’s (1973) 
experiment, factual postquestions have only 
direct instructive effects, that is, they pro- 
duce learning of question-relevant facts, 
but they do not produce incidental learning 
of facts or general ideas. Under the same 
conditions, however, the present experiment 
suggests that meaningful-learning questions 
induce processing behaviors that favorably 
influence the recall of both relevant and 
incidental material. Since meaningful- 
learning questions contribute to the acquisi- 
tion of ideas as well as facts, it appears 
entirely reasonable to conclude that the 
material is learned in an organized rather 
than a discrete fashion—a most significant 

fringe benefit” of meaningful learning. 
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From 121 Illinois classes in four basic subject areas, Grades 6 through 
12, student perceptions were obtained on the emphasis given to cogni- 
tive objectives derived from the six levels of the Bloom taxonomy. 
Using covariance control for class size, grade level, and class type, dis- 
criminant analysis revealed three significant (probability less than 
.001) functions: the first, here termed “convergence-divergence” dis- 
tinguishes mathematics from language arts classes; the second, "syn- 
tax-substance” separates language arts and mathematics from science 
and social studies; and the third, “objectivity-subjectivity” distin- 
guishes science from social studies. The findings together with those 
from a prior study, if further replicated, may suggest objective 
empirical ways of identifying the strueture of curriculum and instruc- 


tion. 


The curriculum has been logically divided 
into subject matter areas on a priori grounds 
since ancient times. The continuing influence 
of Aristotle, Comte, and other early deduc- 
tive theorists is reflected in most contempo- 
rary work in curriculum development (Ford 
& Pugno, 1964). In the last decade, psy- 
chologists have carried out several empiri- 
cal studies to explore the structure of the 
curriculum. The present paper extends this 
work by determining the emphasis given 
to various cognitive objectives (Bloom, 
1956) as perceived by classes in several 
subject areas. 

In the earliest attempt to identify vari- 
ables that distinquish the press on students 
in various major fields of study in higher 
education, Thistlethwaite (1962) analyzed 
10 measures of student perception on the 
Inventory of College Characteristics. Physi- 
cal sciences and mathematics students per- 
ceived strong press for Scientism, Compli- 
ance, and Vocationalism but weak press for 
Humanism and Independence. Humanities 
and social studies students perceived strong 
press for Humanism, Independence, and 


1 Requests for reprints should be sent to Herbert 
J. Walberg, College of Education, University of 
Illinois at Chicago Circle, Box 4348, Chicago, Il- 
linois 60680. 


Enthusiasm but weak press for Scientism, 
Compliance, and Vocationalism. 

Astin (1965) suggested that the student 
perceptions of classroom environment are a 
useful basis for classifying different subject 
fields empirically. A factor analysis of 
college-student questionnaire responses pro- 
duced three factors. Factor I was Foreign 
Language versus Social Science, with For- 
eign Language characterized by enthusias- 
tic instructors who knew their students by 
name while Social Science was charae- 
terized by little classroom discussion, little 
homework, and arguing with the instructor. 
Factor II was Natural Science versus Eng- 
lish and Fine Arts, with the former high on 
students not speaking in class and the latter 
high on class discussion, humor, and diverse 
opinions. Factor III was Business versus 
History, with more testing emphasis, less 
research emphasis, and duller instructors 
in Business, 

In a semantic-differential, factor-analytic 
study of sixth- through ninth-grade classes, 
Yamamoto, Thomas, and Karns (1969) 
found that mathematics and science classes 
rated high on a factor labelled “Vigor” 
(alive-large-strong-fast). Social sciences 
and language arts rated high on a “Cer- 
tainty” factor (safe-easy-usual-familiar). 
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Anderson (1971) compared student per- 
ceptions of the social environment of 54 
science, mathematics, French, and human- 
ities classes in eight Montreal high schools. 
‘A multivariate analysis of covariance tested 
the relation of the 15 Learning Environ- 
ment Inventory scales to the subject matter 
classifications with class size, sex of stu- 
dents, and IQ as covariates; discriminant 
analysis was also employed. The first dis- 
criminant function (see Figure 1) separated 
mathematics from science, French, and 
humanities (English plus history) classes; 
mathematics classes were characterized as 
being high in friction and in favoritism but 
low on formality. The second function 
separated science classes seen as disorga- 
nized and satisfying from humanities 
classes seen as apathetic. The third function 
separated French from the other subjects 
with French seen as goal directed and dis- 
satisfying. 

Differences in criteria characterizing sub- 
ject areas, as well as different ways of 
grouping areas for analysis, made sum- 
mary of the foregoing studies difficult. 
Clearly, however, students perceive subject 
matter areas differently. The purpose of the 
present study was to examine the differ- 
ences among four subject areas on student 
perception of the emphasis given to various 
cognitive processes. In an exploratory study 
such as this, it seemed reasonable to con- 
centrate on subject areas generally con- 
sidere “core” or basic subjects. The value 
of using subject areas that allow at least a 
partial replication of previous work (An- 
derson, 1971) also supported the choice of 
subject areas. 


METHOD 


Data were collected from 121 independen: 
constituted classes in 69 Illinois esas. part ny 
statewide evaluation of gifted programs (Steele. 
House, & Kerins, 1971). Over 20 classes represent- 
ing each of the four basic subject areas of science, 
mathematies, social studies, and language arts, and 
ranging from Grades 6 to 12 composed the sample. 
About one third of the classes in each subject area 
were composed of students identified as gifted. 

Twenty-three items from the Class Activities 
Questionnaire were used to obtain the students’ 
perception of prevailing patterns of instructional 
emphasis, Students were asked to agree or disagree 
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on a 4-point scale to items describing the general 
kinds of activities which characterized their class. 
These items were derived from six levels of cogni- 
tion on the Bloom (1956) Taxonomy and several 
affective conditions that might be stressed in class 
(see Table 1, which contains the items with signifi- 
cant loadings on discriminant functions). A de- 
scription of the development of this instrument 
and evidence for reliability are presented in Steele, 
House, and Kerins (1971). 


Analysis 

A multivariate analysis of covariance with dis- 
criminant functions was carried out on the class 
means of the items. To introduce as much statisti- 
cal control as possible, several possibly contaminat- 
ing effects were removed from the criteria by 
covariance adjustments before testing the differ- 


ences among the four subject matter areas. The 
covariates included the following variables: lin- 
ear—grade level, class size, type (coded 1 if desig- 
nated gifted, 0 otherwise); quadratic—grade level 
squared and class size squared; and interactions— 
Grade Level X Class Size, Grade Level X Type, 
and Class X Type. 


RESULTS AND DISCUSSION 


Table 1 shows the magnitude and sig- 
nificance of the discriminant variates: the 
first discriminant function accounts for 
69% of the variance; the second function, 
17%; and the third, 13%. The chi-square 
approximations to Wilk's lambda (Bock, 
1963) for all three functions are highly 
significant (probability less than .001). 
Figure 1 shows the positions of subjects on 
the discriminant functions and Anderson’s 
(1971) Montreal functions with the zero 
points set at science in both cases for com- 
parability. 

“Loadings,” that is, the correlations of 
the criterion variables with the discriminant 
functions, were calculated for each Illinois 
function. Fifteen of the 23 Class Activities 
Questionnaire items had significant loadings 
(probability less than .01) on the first func- 
tion. Six items had significant loadings on 
the second function and 8 items had sig- 
nificant loadings on the third function.? The 
items with significant loadings on the first 
function, which is termed "convergence- 
divergence” and which contrasts language 
arts and mathematics (see Figure 1) sug- 


2 Interested readers can write the authors for a 
copy of the complete questionnaire and loadings. 
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TABLE 1 


VARIANCE ACCOUNTED FOR AND SIGNIFICANCE 
or Discriminant FUNCTIONS 


Discrim-| 
Discriminant function pianta [3 niens pulp 
[77] 


. Convergence-divergence | 69.12 | 262.78*| 69 
. Syntax-substance 17.50 | 114.07*| 44 
. Objectivity-subjectivity | 13.38 | 51.24*| 21 


wre 


* Significant beyond the .001 level. 


gest the two well-established psychometric 
factors—verbal and number. The variables 
associated with the language arts side of 
the dimension have to do with the cognitive 
operations: interpretation (Items 16 and 
6), evaluation (Item 20), synthesis (Item 
23), and translation (Item 9). Affective 
processes include student independence 
(Item 14) and participation in discussion 
(Item 5). The items associated with this 
pole of the discriminant function all sug- 
gest an openness to or consideration of 
many alternative “answers” or the creation 
of new ones. Associated with mathematics 


i !Convergence- Divergence!’ !Syntax-Substance" “Objectivity-Subjectivity"’ 
.0— 
- je Math 
3 (3.8) 
3.0— 
< Math e 
3 (278) le French 
2.0— (2. 
- Language Arts (1.7) 
e Math (1.7) : 
1.0— 
E Math (0.4) 
5 e pases se È Humanities (0.3) 
0.0— Sci ele Science (0.0) Science ele Science Sclence efe Sclence 
d ctoo fe! French (0.0) (0.0) (0.0) (0.0) 
- (-0.2) Social gin . 
-1.0— le Math 
£ je Arts e GEN 
x Social Studies * A CT-A) e French 
- (4.5) Math e| — (1.6) 
-2.0— (71.8) 
z Language Arts @ Soclal Studies e 
Š? -2. ix je Humani ties 
D z (-2.5) (72.6) (72.7) 
^ Ilinois Montreal Ilinols Montreal Iinols Montreal 
Ficure 1. Subjects in discriminant space. 
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are variables dealing with analysis (Items 
12 and 7), memory (Items 1 and 10), the 
affective conditions of test and grade stress 
(Items 8 and 22), the absence of humor 
(Item 25), and little discussion (Item 15). 
A reading of these items reveals the strong 
flavor of converging on a single “right” an- 
swer in a methodical, no-nonsense way. 
While the naming of functions is in the last 
analysis an intuitive act, the tenor of the 
items themselves supports, the labels se- 
lected. Figure 1 shows that in the first and 
largest discriminant functions of the Ili- 
nois and Montreal studies, mathematics is 
sharply distinguished from other subjects 
and appears to be seen as unpleasant. The 
similarity of these findings, despite the use 
of different instruments, is striking. 
Loadings on the second function, termed 
“syntax-substance,” represent a contrast of 
language arts and mathematics with science 
and social studies. The function appears to 
represent Schwab’s distinction of tool 
and application subjects in the curricu- 
lum. The variables associated with mathe- 
matics and language arts include the cogni- 
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tive operations: synthesis (Items 23 and 
11), translation (Item 9), and application 
(Item 3). Associated with science and 
social studies are the cognitive processes of 
summarizing (Item 21), memorizing (Items 
1 and 10), and evaluating (Item 20). The 
second function in the Montreal study is 
similar to this function in that the study of 
French is more syntactical than substan- 
tive. 

Loadings shown on the third discriminant 
function termed “objectivity—subjectivity” 
contrast science with social studies. The 
variables associated with science include in- 
dependent exploration (Item 14), learning 
and memorizing (Items 1, 8, and 10), in- 
terpreting (Item 6), and synthesizing (Item 
23). Associated with social studies are vari- 
ables dealing with evaluation (Items 2 and 
20) and the absence of humor (Item 25). 
This function might well have been labeled 
“doing versus judging.” The items, how- 
ever, suggest an orientation to the seeking 
and processing of external data as opposed 
to reviewing information and its implica- 
tions in the light of a personal set of values. 
Thus, this function contrasts more objective 
acquisition of information in science with 
More subjective evaluation of information 
in the case of social studies. Put simply, a 
contrast of “what versus so what.” It is 
Somewhat similar to the third Montreal 
function in that the humanities that are 


subjective were also contrasted with sci- 
ence, 


Concuston 


Farther mapping of cognitive processes 
emphasized in other subject areas might 
help conceptualize the curriculum in psy- 
chological and empirical rather than logical 
and hypothetical terms. The coordinates 
found here were somewhat similar to An- 
derson’s (1971) ; however, his were derived 
with an instrument tapping sociopsycho- 
logical rather than cognitive classroom 
press. If replicated further, such orthogonal 
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coordinates may organize the relations of 
the subject areas in objective, interpretable, 
parsimonious ways. Since cognitive proc- 
esses and aspects of the social environment 
transcend the curriculum, designing instru- 
ments to measure them which focus on the 
press or context of instruction may enable 
the further relating of psychological proc- 
esses to subject areas in ways not ordinarily 
thought of by curriculum makers, teachers, 
and students. For example, the present re- 
search raises such questions as, Is mathe- 
matics inherently convergent? If not, why 
is it apparently conveyed convergently to 
the student? Should the syntactical aspects 
of science and social studies be given more 
emphasis? Can some subjective aspects of 
science and some objective qualities of so- 
cial studies and humanities be imparted? 
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CONDITIONS OF REPEATED TESTING 
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Two experiments investigated the acquisition of course material under 
conditions of repeated testing. In Experiment 1, with limited study 
intervals, acquisition increased over trials of study followed by test- 
ing. In addition, adjunct information about the content of the test 
item pool also increased performance. In Experiment 2, with student- 
determined study intervals, there was little or no change in perform- 
ance associated with repeated testing. 


The present educational system requires 
that students learn from written instruc- 
tional materials. The common practice of 
introducing an assigned instructional pas- 
sage in a course is a one-shot opportunity in 
which a single study interval is followed by 
a test, the results of which, hopefully, dem- 
onstrate that some learning has occurred. 
Under conditions of repeated testing, at 
minimum, a second study interval is intro- 
duced and is followed by a test; that is, a 
second trial is introduced which allows addi- 
tional learning to occur and to be demon- 
strated. Repeated testing has been intro- 
duced by Jensen and Prosser (1969), but 
these investigators did not report any acqui- 
sition data. Acquisition data could be ob- 
tained with any of the computer-based in- 
structional management systems that are 
currently in use (cf. Baker, 1971). Another 
possibility for obtaining trial-by-trial acqui- 
sition data would be to utilize the “mastery 
learning” strategy described by Bloom (cf. 
Bloom, Hastings, & Madaus, 1971) or the 
“personalized system of instruction” de- 
scribed by Keller (1968). In the present 
studies, an instructional passage was as- 
signed to students, and for this passage, a 
large pool of test items was constructed to 
measure learning. A large number of tests 
were then constructed, each of which con- 


* Requests for reprints should be sent to James 
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sisted of a random sample of test items from 
the pool, and each student was given a 
different one of these tests during acquisition 
on each trial. 


EXPERIMENT 1 


The main purpose of this experiment was 
to study acquisition under conditions of re- 
peated testing. An instructional passage was 
available for study only in a study room. On 
each trial, the students would come to the 
room, study the passage for a two-hour 
period, and then take one of the tests on the 
material. In this way, the available study 
interval on each trial was fixed at two hours. 
The amount of actual study time cannot be 
determined, of course, but it is probably less 
than the available study interval. This pro- 
cedure was followed five times for a total of 
five acquisition trials according to a schedule 
which each student individually prearranged 
prior to the first trial. It was expected that 
as the number of study trials increased, the 
amount of learning would increase. 

The second purpose of this study was to 
determine the effects on acquisition of the 
amount of information given to students 
about the content of the test item pool. All 
of the items in the pool were generated by 
applying a single rule to the instructional 
passage. Therefore, all items in the pool 
from which the tests were constructed could 
be said to represent the same item type 
(Anderson, 1972; Bormuth, 1970; Schles- 
inger & Weiser, 1970). In the no information 
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condition, no information adjunct to the in- 
structional passage was given to students 
about the content of the item pool from 
which their tests would be constructed. In 
the rule information condition, students were 
given a statement of the rule that was used 
to generate the test items in the pool, and 
illustrations of the use of the rule were also 
given. In the question information condition, 
students were given the actual test questions 
which comprised the pool. These conditions 
represented the amount of information given 
about the content of the test item pool and 
were similar to procedures which provide ad- 
junct encoding cues to increase learning (cf. 
Crouse & Idstein, 1972). Acquisition was 
expected to increase over the no information, 
rule information, and question information 
conditions, respectively, since with increased 
information about the content of the ques- 
tion pool, more information required by out- 
put which might not otherwise be encoded 
should be encoded into memory (cf. Crouse 
& Idstein, 1972). 


Method 


Materials. An article entitled “Reforms As Ex- 
periments” by Campbell (1969) was chosen from 
the reading materials of a course in educational 
research procedures. A 117-question test item pool 
was constructed by applying the following rule 
117 times throughout the passage. A sentence was 
taken directly from the passage and expanded, if 
necessary, to include words obviously understood 
from the context of the passage but not included 
in the sentence, From this expanded sentence, a 
portion was identified, and a wh- question was 
derived to question this portion. For example, the 
sentence, “Many of the difficulties lie in the in- 
transigencies of the research setting and in the 
presence of recurrent seductive pitfalls of inter- 
pretation,” was expanded according to the context 
of the article to read, “Many of the difficulties of 
an experimental approach to social reform lie in 
the intransigencies of the research setting and in 
the presence of recurrent seductive pitfalls of in- 
terpretation.” From this expanded sentence, the 
wh- question, “Where do many of the difficulties of 
an experimental approach to social reform lie?” 
was derived. From the pool of 117 test questions a 
large number of 21-item tests were constructed. 
The tests were constructed by a computer program 
which randomly sampled 21 items for each test 
from the pool of 117 items. The tests had a com- 
pletion format in which each question was typed 
with a space where its answer was to be supplied 
by the student. 

Procedure. All students in the experiment were 
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enrolled in the course in educational research pro- 
cedures. When each student reached the time in” 
the course to study the article entitled “Reforms” 
As Experiments,” he was given a general set of 
instructions which explained to him that he was 
being asked to follow a very specific set of proce- 
dures in the hope that rapid learning would occur, 
Explanation of the procedures was in terms of 
course improvement, and no mention was made of - 
an experiment. All studying was done in a special 
study room, and each student individually sched- 
uled five study periods in the room prior to his 
first appearance. When each student arrived, he 
was handed the article and given exactly two 
hours to study it. Immediately following this study 
period, he was given one of the tests and allowed 
unlimited time to complete it. This procedure was 
followed each time the student came to the study 
room, and students came to the study room five 
times during a two-week period. Thus, each stu- 
dent was allowed 5 two-hour acquisition trials 
for learning. No information was given students 
about their acquisition progress until after they 
had completed the fifth acquisition trial. 

When each student arrived at the study room 
the first time, he was assigned to one of the three 
information conditions: no information, rule in- 
formation, or question information. Students in 
the no information condition completed five ac- 
quisition trials in the above manner with no infor- 
mation adjunct to the passage being given about 
the content of the test item pool. Students in the 
rule information condition were given a written 
statement of the rule that was used to generate the. 
test items in the pool. Included with the state- 
ment were examples of the use of the rule, but 
these were applied to different written materials 
than the students would be reading. The students 
were told that the questions to test their learning 
would be constructed using this rule, and they 
were allowed to refer to the rule information as 
much as they wished during learning. The students ^ 
in the question information condition were given 
the entire pool of 117 questions from which their 
tests would be drawn. The questions were listed 
in the order in which their answers occurred during 
reading, and students were allowed to refer to 
these questions freely during learning. 

Design. Five acquisition trials were given under 
each of the three information conditions: no in- 
formation, rule information, and question informa- 
tion. Therefore, the design was a 5 X 3 factorial 
with repeated measures on the first factor. Four- 
teen students enrolled in the course in educational 
research procedures at the University of Delaware 
were randomly assigned to each of the three in- 
formation conditions. 


Results 


The number of correct answers on each | 
acquisition trial was computed for each stu- 
dent. An answer was counted correct if it 
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LE 1 


Mean NUMBER or Correct Responses FOR EACH INFORMATION CONDITION ON EacH TRIAL 


Trial 
Information 
1 2 3 4 5 
No information 5.82 8.68 8.75 9.82 12.04 
Rule information 4.65 7.18 9.18 8.54 10.65 
Question information 8.11 12.00 13.93 16.04 16.90 


was a verbatim reproduction or a complete 
and accurate paraphrase of the answer. It 
was counted one half correct if it failed to 
meet these criteria but was partially correct. 
Otherwise, it was counted as incorrect. The 
mean number of correct answers on each of 
the five acquisition trials is shown in Table 
1 for each of the three information condi- 
tions. It ean be seen that acquisition in- 
creased over the five trials for all conditions 
(F = 55.42, df = 4/156, p < .01). Also, ac- 
quisition differed significantly in the three 
information conditions (F = 14.74, df = 
2/39, p < .01), and Newman-Keuls tests 
showed that performance was greater in the 
question information condition than in the 
rule information and no information condi- 
tions (both, p < .01) and that these latter 
conditions did not differ greatly (p > .05). 
Finally, the differences among the three in- 
formation conditions increased over trials 
as indicated by their significant interaction 
with trials (F = 2.66, df = 8/156, p < .01). 

Each student was given a random sample 
of test items from the question pool on each 
trial; therefore, there inevitably was some 
repetition of questions across trials. A ques- 
tion which arises is the extent to which the 
acquisition data in Table 1 might represent 
gains in performance associated with the 
repeated questions over trials. For example, 


students might remember an item from one 
trial when it is presented on a later trial, 
and acquisition might predominantly reflect 
gains of this kind. Consequently, acquisition 
was examined over Trials 2-5 separately for 
those items that were repeated from an ear- 
lier trial and those items that were not re- 
peated. The number of items correct of each 
type was expressed as a percentage of the 
number of items which occurred of each 
type, and these percentages are summarized 
for each acquisition trial and information 
condition in Table 2. The results clearly in- 
dicate that acquisition increased over trials 
(F = 10.79, df = 3/117, p < .01). They also 
indicate that some item learning did occur 
since overall acquisition was higher for the 
repeated than nonrepeated items (F = 
18.86, df = 1/39, p < .01) ; however, acqui- 
sition was not limited only to the repeated 
items, as indicated by the nonsignificant 
Trials x Repeated versus Nonrepeated 
Items interaction (F < 1.00). The amount 
of item learning interacted with the informa- 
tion conditions; item learning was greatest 
in the no information condition, somewhat 
less in the rule information condition, and 
least in the question information condition 
(F = 3.87, df = 2/39, p < .03). This latter 
result is to be expected since in the question 
information condition, all items were avail- 


TABLE 2 
PERCENTAGE Correct ANSWERS FOR NONREPEATED AND REPEATED ITEMS 
Nonrepeated items Repeated items 
Information 

Trial 2 Trial 3 Trial 4 Trial 5 Trial 2 Trial 3 Trial 4 Trial 5 
No information 38.72 35.44 40.40 46.00 49.38 48.66 58.70 62.64 
Rule information 31.76 | 42.06 | 32.53 52.36 | 41.56 | 41.76 49.84 51.73 
Question information | 58.59 | 63.35 | 75.09 | 78.65 | 64.41 | 69.46 | 71.07 | 80.75 
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able for inspection at all times during study. 
Again, it can be seen that overall acquisition 
differed among the three information condi- 
tions (F = 14.36, df = 2/39, p « 01). 


Discussion 

The results indicate that an increased 
number of acquisition trials resulted in in- 
creased learning of course material. They 
also indicate that increased information 
about the content of the test item pool facili- 
tated performance. Almost all of this in- 
crease occurred when students were given 
the actual items from which their tests 
would be constructed (question information 
condition), with no apparent effect being 
seen when a rule was given by which the 
test items would be constructed (rule infor- 
mation condition). There are several possi- 
bilities to account for this finding: the rule 
may not have been helpful because the ex- 
treme extension of its use generated a poten- 
tial test question for every sentence in the 
passage; efforts to use the rule might have 
been abandoned because they disturbed the 
students usual study habits; the rule may 
have been too complicated and was simply 
ignored; or other explanations are possible 
as well. Nevertheless, information about the 
specific items in the test pool did facilitate 
acquisition and did add further support to 
the principle that under circumstances 
where conditions adjunct to reading mate- 
rials can plausibly be assumed to influence 
encoding of information required by output 
that would not otherwise be encoded, out- 
put performance’ will be facilitated (cf. 
Crouse & Idstein, 1972). The important ele- 
ment lacking in this conceptualization as 
well as in all studies investigating the effects 
of adjunct encoding cues is how to concep- 
tualize the memory information that is re- 
quired for performance on the different 
kinds of output questions used to measure 
learning from prose instruction. In the ab- 
sence of this knowledge, the search for ad- 
junct encoding cues that will facilitate en- 
coding of relevant memory information and, 
therefore, increase performance on different 
item types will necessarily continue on a 
hit-or-miss basis. 
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EXPERIMENT 2 


The results of Experiment 1 indicated 
that as the number of study trials increased, 
acquisition increased. Another way to say 
this is that students were motivated to do 
more learning than they could achieve on 
the early trials. This implies that the benefi- 
cial effects of repeated testing resulted from 
the use of a study interval as short as two 
hours on each trial, which restricted the 
amount of learning that could occur. Ex- 
periment 2 tested this possibility. Students 
were given the reading material so that they 
could take it home to study at their con- 
venience. When they wanted to, they could 
come to the study room and be tested on the 
reading. In this setting, the available study 
interval was student determined by the 
length of time before they came in to be 
tested. The amount of actual study time 
which occurred during this period was esti- 
mated by having each student keep a de- 
tailed record of his study. After being tested, 
the students could again take the passage for 
a second study interval, study as much as 
they wanted, and then come for testing. A 
total of four acquisition trials could be com- 
pleted in this fashion; the number each stu- 
dent completed was left optional. 


Method 


Materials, The materials were the same ones 
used in Experiment 1; that is, the article entitled 
“Reforms As Experiments,” the 117-question test 
pool, and the batch of 21-question tests each com- 
prising a random sample of 21 questions from the 
question pool. 

Procedure. All students in the experiment were 
again enrolled in the course in educational research 
procedures. When each student reached the time 
in the course to study the article entitled “Re- 
forms As Experiments,” he was given general in- 
structions similar to those in Experiment 1 which 
explained that he was being asked to follow very 
specific procedures and that these procedures were 
justified in terms of facilitating learning and effect- 
ing course improvement. No mention was ever 
made of an experiment. Studying could be done 
anywhere, and each student was asked to record, 
to the minute, each starting and stopping time. 
How to do this was explained as was the use of & 
form for recording. Short breaks from study were 
not to be included as study time. When ready, 
the students could come for testing, at which time 
they were given one of the tests with unlimited 
time for its completion. No information was ever 
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provided the student about the content of the 
test item pool. After completing a test, they could 
undertake additional study and follow it with a 
second test. Up to four trials could be completed 
in this fashion. Fifty-three students participated 
in this study-learning procedure. 


Results 


As expected, most students (n = 36) com- 
pleted all four acquisition trials; the num- 
ber of students who completed only one, 
two, or three trials was 5, 6, and 6, respec- 
tively. Because the number of students was 
substantial only for the four-trial group, 
subsequent analyses were done on these 
data. The number of correct answers on 
each of the four acquisition trials was com- 
puted for each student, Answers were scored 
in the same way as Experiment 1; that is, 
fully correct for a verbatim reproduction or 
completely accurate paraphrase, one half 
correct for a failure to meet the fully correct 
criteria but a partially correct answer, and 
incorrect. The mean number of correct an- 
swers was 4.26, 5.41, 5.30, and 4.98 on Trials 
1-4, respectively. These data show that per- 
formance first inereased slightly and then 
decreased slightly over trials. While the 
analysis of variance revealed a significant 
F for these changes (F — 9.27, df — 3/105, 
p « .01), the overall magnitude of these 
changes was very small; that is, always less 
than 1.5 correct answers out of a possible 
21. It is as if the potentially unlimited study 
intervals, in contrast to Experiment 1, al- 
lowed the students to acquire as much learn- 
ing as they wished on Trial 1 but that with 
added trials they did not acquire more. 

The question arises as to the relationship 
between study time and acquisition. The 
actual time spent studying was available for 
each student, and since the greatest variabil- 
ity in study time was on the first trial, this 
first-trial data seemed most likely to reveal 
the expected positive relationship between 
the length of study time and performance. 
Unexpectedly, there was no sign of a positive 
correlation (r = .10, df = 34, p > .05). One 
explanation for this failure to find a positive 
relationship between study time and acqui- 
sition is that study time is negatively corre- 
lated with ability and that ability is, in 
turn, directly related to correct performance. 
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Consequently, the positive relationship be- 
tween study time and performance could be 
masked by ability differences. The data sup- 
ported this interpretation: for the 28 stu- 
dents completing four acquisition trials for 
whom measures of ability were available 
(verbal scores on the Graduate Record Ex- 
aminations), the correlation between study 
time on Trial 1 and ability was in the right 
direction and was significant (r = —.50, 
p < O01). The correlation between ability 
and correct performance on Trial 1 was also 
in the right direction but fell just short of 
significance (r = .30, p < .10). Most impor- 
tantly, however, when differences in ability 
were partialed out, the correlation between 
study time and correct performance was 
positive and significant (r = .42, p < .01). 

The results of both experiments lend fur- 
ther support to the importance of study 
time as a factor in learning. In Experiment 
1, with limited study intervals, study time 
increased over trials as did acquisition. In 
Experiment 2, with potentially unlimited 
study intervals, there was little increase be- 
yond Trial 1; but on Trial 1, study time was 
found to be positively related to perform- 
ance after differences in student ability were 
partialed out. It seems that repeated testing 
as used in its barest form in the present 
studies is important primarily as a way of 
increasing study time when the study inter- 
vals are less than optimal for achieving the 
desired learning. If study time can be in- 
ereased by other means, little or no advan- 
tage may be found for repeated testing, at 
least as used in the present studies. 
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PREDICTABILITY OF STUDENTS' EVALUATIONS OF COLLEGE 


TEACHERS FROM COMPONENT RATINGS! 


GRACE FRENCH-LAZOVIK* 
University of Pittsburgh 


Two similarly designed studies which were conducted 15 years apart at 
different universities and which involved over 9,700 students and 277 
faculty gave nearly identical answers to the question of what teaching 
characteristics carry greatest weight in predicting students’ general 
opinion of their teachers. Items used on student evaluation of teach- 
ing scales were treated as predictors of students’ overall ratings of 
teaching effectiveness. Reduced-rank regression analysis revealed high 
multiple correlations (.97 and .93) for items dealing with clarity of ex- 
position, arousal of student interest, and stimulation or motivation to 
intellectual activity. Neatness of appearance, friendliness’ of manner, 
sense of humor, the giving of individual attention, and the handling 
of examinations carried little weight in predicting students’ evalua- 


tions of effective teaching. 


The use of student evaluations of college 
teaching, either systematically or infor- 
mally, as measuring devices has spread rap- 
idly on college campuses in recent years. 
Doubtless, pressures for accountability and 
student unrest have both contributed to this 
increase. Whatever purpose these evalua- 
tions are intended to serve—as feedback to 
teachers for their own improvement or as 
information to department chairmen for de- 
cisions—a better understanding of what 
students are evaluating in teaching is sorely 
needed. 

Much is known about the technical char- 
acteristics of student evaluations, for exam- 
ple, their reliability, correlations with class 
and student variables, and validity defined 
as correlation with measures of perform- 
ance (for excellent reviews of this literature, 


*The University of Washington-study was sup- 
ported in part by a grant from the Carnegie Cor- 
poration to Edwin R. Guthrie and in part by 
Public Health Research Grant M-743(C2) and 
Office of Naval Research Grant Nonr-477(08) to 
Paul Horst. The University of Pittsburgh study 
was supported by the Center for the Improvement 
of Teaching, College of Arts and Sciences. 

? Requests for reprints should be sent to Grace 
French-Lazovik, Center for Improvement of 
Teaching, University of Pittsburgh, Pittsburgh, 
Pennsylvania 15260. 


see Costin, Greenough, & Menges, 1971; 
McKeachie, 1969). However, when human 
judgment is used as a measuring device, 
there are questions bearing on construct va- 
lidity which should be asked. Could this 
complex judgment be analyzed into the im- 
portant components which contribute to it? 
What is the relative influence of each com- 
ponent? In the case of student judgments, 
what kind of teaching do students evaluate 
most highly? 

Many investigators have approached this 
last question by asking a similar though 
different one; that is, what do college stu- 
dents consider to be the most important 
characteristics of good teaching? The late 
Edwin R. Guthrie (personal communica- 
tion, approximate date, June 1947), whose 
unpublished studies of student evaluations 
of teaching began in the 19208? sought an- 
swers to this question by simply asking stu- 
dents to describe the qualities they consid- 
ered of greatest importance in good teach- 
ing. A first list was eondensed by content 
analysis and presented to à second group of 
students who were asked to check those 


3Some of these early studies resulted in the 
five-item scale used at the University of Washing- 
ton from 1946 through 1956. This scale appears in 
the well-known monograph by Guthrie (1954). 
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qualities which were characteristic of their 
best teacher. 

"This methodology, or minor variations of 
it, has been used by many subsequent in- 
vestigators (Crawford & Bradshaw, 1968; 
Downie, 1952; Gadzella, 1968; Musella & 
Rusch, 1968; Riley, Ryan & Lifshitz, 1950; 
Smith, 1944) despite the fact that it leaves 
much to be desired, since it relies on the 
students’ ability to answer the question ad- 
equately, provides no way of knowing 
whether all relevant variables have been 
included, and does not reveal the relative 
differences in importance of the characteris- 
ties mentioned or checked. 

Factor-analytie studies of item pools in 
this domain typically do not include a cri- 
terion measure. Thus, they give information 
on the dimensions being measured but not 
on which of these dimensions relate to the 
students’ concept of good teaching. It is 
quite likely that some of the dimensions 
measured do not contribute to the discrimi- 
nation made by students between good and 
poor teachers. 

In an early classic study, Coffman (1954) 
avoided both the problem of having stu- 
dents introspect as to what they valued in 
teaching and that of factor validity by fac- 
tor analyzing the 18-item Oklahoma A. and 
M. Rating Scale for Instructors to deter- 
mine what factors accounted for the stu- 
dents’ general estimate of teaching effec- 
tiveness. While this study provided a better 
methodology, there are two aspects of it in 
which major improvements can be made. 
First, Coffman's items were relatively few 
in number and were limited to those on one 
existing scale. Second, a more powerful de- 
sign would result by simply posing the same 
question in terms of a prediction model. 
The students’ general estimate of teaching 
effectiveness could be examined to deter- 
mine how well it can be predicted, what the 
best predictors are, and what the relative 
weight of each item in the prediction is, 

Such a formulation involves several prob- 
lems. In the study of student evaluations, 
approximately 40 to 50 predictor variables 
would have to be considered and data on 
these variables could feasibly be obtained 
for perhaps 100 to 150 teachers. The prob- 
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lem of degrees of freedom then arises, for 
whenever the number of variables is large 
in comparison to the number of cases, the 
determination of regression weights is us- 
ually considered not justified, since a stable 
solution cannot be expected. Fortunately, 
this problem can be avoided by using the 
reduced-rank regression model developed 
by Horst (1941). In his procedures, regres- 
sion coefficients are based on the principal- 
components factor matrix which usually 
has a considerably smaller rank than the 
intercorrelation matrix. In addition, Lei- 
man (1951) and Burket (1964) have shown 
that reduced-rank regression analysis re- 
sults in more stable regression weights than 
those based on traditional procedures. A 
special advantage for the study of judg- 
ments derives from the fact that regression 
coefficients are calculated from an orthogo- 
nal matrix. As a result, the product of the 
regression coefficient and of an item's corre- 
lation with overall judgment ean be rank 
ordered to indicate the relative contribution 
of each item to the overall prediction. 

The recent University of California, 
Davis, study (Hildebrand, Wilson, & 
Dienst, 1971) did use a large item pool. 
However, the efficacy of their factor-ana- 
lytie findings rests in large part on their 
method of item selection, namely, the use of 
diserimination indices. In addition to the 
fact that the stringency of their criteria for 
inclusion in the extreme groups can be ques- 
tioned, these indices pose problems related 
to sample composition (Gulliksen, 1950). 

Further, Hildebrand et al. (1971) did not 
use methods which indicate the relative im- 
portance of the different criteria students 
use in evaluating teaching. 

The application of reduced-rank regres- 
sion analysis to a broadly representative 
item pool should further our understanding 
of what kind of teaching is evaluated most 
highly by students. 


PROBLEM 


The two nearly identical studies reported 
herein were designed to determine what 
teaching characteristics are most predictive 
of college students’ overall judgments of 
teaching effectiveness, The first study was 
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carried out at the University of Washing- 
ton, in the 1956-1957 academic year, the 
second at the University of Pittsburgh in 
the 1971-1972 academic period. 


METHOD 


For this problem, the application of the reduced- 
rank regression model required that students reg- 
ister judgments of their teachers on a set of verbal 
descriptions phrased as items on a rating scale 
and that they also register judgments of their in- 
structor’s overall teaching effectiveness. The sep- 
arate items were then treated as a set of predictors 
of the overall judgment. By factor analyzing the 
predictor intercorrelations, using the principal- 
axis factor matrix to calculate the regression co- 
efficient for each item, and obtaining the product 
of these values and their respective correlations 
with the criterion, the relative contribution of each 
item in predicting the overall judgment was deter- 
mined. 

Statements describing teacher characteristics or 
classroom behavior were phrased so that they 
could be judged on a 5-point scale. The descriptive 
categories were stated in comparative terms that 
were the same for all items and for the overall 
judgment. The categories used were as follows: 


In this respect, your instructor ranks below 
most of the teachers you have known. 

In this respect, your instructor is only fair 
in comparison with other teachers you have 
known, 

In this respect, your instructor is competent 
and compares well with the average of the 
teachers you have known. 

In this respect, your instructor is well above 
the average of the teachers you have known. 

In this respect, your instructor is one of the 
most outstanding you have known. 


It was important that the original set of items 
span the field of variables which could possibly 
be related to students’ concepts of good teaching. 
The initial pool depended on the investigator's 
guesses as to what the important variables might 
be, but the method of analysis provided a precise 
test of the degree to which the chosen items did, 
in fact, include all relevant variables; that is, the 
analysis indicated what percentage of the variance 
of the overall judgments was accounted for by the 
set of items. 


Item Collection: University of Washington 
Study 


To increase the probability that all relevant 
variables would be tapped, items were sought from 
different, sources. The top 10 items from Guthrie’s 
studies were included (see Footnote 3). A repre- 
sentative set of items was chosen from the rating 
forms used at many colleges and universities 
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throughout the country. Students’ general com- 
ments about their teachers, collected over the 
years from open-ended questions on the teaching 
evaluation form used at the University of Wash- 
ington, were also searched. And lastly, items based 
on faculty members’ views regarding important 
characteristics of good teaching were added, A 
final set of 41 verbal statements represented the 
end product of the collecting, modifying, and 
rephrasing procedures (see Table 1). 


Item Collection: University of Pittsburgh 
Study 


Instead of sampling sources of items, the Uni- 
versity of Pittsburgh study tried to represent 
some of the factors found in factor-analytic stud- 
ies of student evaluation item pools (Bendig, 1954; 
Coffman, 1954; French, 1957; Hildebrand et al., 
1971; Hodgson, 1958; Isaacson et al., 1964; Reh- 
berg, Roberts, & Vandament, 1970; Remmers & 
Baker, 1952; Solomon, 1966). While the number of 
factors identified by these investigators ranged 
from 3 to 10, 4 to 6 of them appeared rather con- 
sistently. Authors showed little congruence in the 
names assigned to the various dimensions, so for 
the purposes of this study, 6 item clusters were 
chosen and renamed. These clusters were identi- 
fied as follows: (A) arousal of student interest, 
(B) clarity, (C) encouragement of student in- 
itiative, (D) teacher preparation, (E) teacher in- 
terest in students, and (F) grading. Items rep- 
resenting these clusters are marked by the 
appropriate letter in Table 1. The following 5 
items, not used in the University of Washington 
study, were also represented in the clusters: 


Has stimulated thinking on the part of the 
students (C), 

Has presented course content in an organized 
manner (D), 

Bases grades on adequate sample of student 
work (F), 

Gives feedback as to how students are progress- 
ing (F), and 

Has presented course content so that you 
perceive its relevance to your interests (new 
item). 


The 16 items in Table 2 were selected as a sub- 
set to represent these 6 clusters. 


University of Washington Sample 


A 25% sample of the faculty, exclusive of 
graduate professional schools, was selected so 
as to be representative of each subject taught and 
of each of the academic ranks from instructor 
through full professor. Letters requesting coopera- 
tion in the study and providing assurance that 
the results would be used only for research were 
written to the 144 faculty members selected; 133 or 
92% expressed willingness to participate. In the 
classes of 37 professors, 22 associate professors, 52 
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assistant professors, and 22 instructors, 3,654 rating 
forms were completed and provided the data base 
of the University of Washington study. 
University of Pittsburgh Sample 

A random sample consisting of 25% of each of 
the academic ranks from instructor through full 
professor in the College of Arts and Sciences was 
selected. These 160 faculty were asked permission 
to survey one of their undergraduate classes. Of 
these, 144 or 90% were surveyed and 6,120 rating 
sheets were completed in the classes of 42 pro- 
fessors, 32 associate professors, 51 assistant pro- 
fessors, and 19 instructors. 


Data Collection 


In both studies, surveys of student opin- 
ion of teaching were arranged at a time 
within the last three weeks of the term 
which was convenient to each teacher. Pre- 
cautions were taken to guarantee the stu- 
dent’s anonymity, to hold constant the in- 
structions to each class, and to dissociate 
the surveys from any administrative pur- 
pose in the minds of both the students and 
the faculty. Administering the question- 
naire entailed going into the classroom of 
each instructor at the appointed time, wait- 
ing for him or her to leave, giving explana- 
tions about the survey, distributing forms, 
answering questions, and receiving the com- 
pleted form from each student present. 


Analyses 


Data from the two studies were analyzed 
separately following identical designs. Analysis 
began with the computation of the average, for 
each item, of the ratings assigned by the members 
of a class to their teacher. These averages were 
intercorrelated for the predictor items, and this 
matrix was factor analyzed using Hotelling’s (1933) 
principal-components method. Factoring was 
stopped when the sum of the eigenvalues was ap- 
proximately equal to the sum of the item reli- 
abilities, calculated by Horst’s (1949) generalized 
formla. Regression coefficients were then ob- 

ined for the predictors using the followi - 
mula (Horst, 1941) : Erg. 


B = a D? (a^ $0) 
where 


B = the vector of regression coefficients, 

a =the principal-components factor-loading 
matrix, 

a’ = the transpose of a, 

D = the diagonal matrix which is the product 
of a'a, and 

T. = the vector of criterion correlations. 
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The elements of the Vector B are designated B 
and of the Vector re, Tse. The quantity Bir 
resents the product of the regression coefficien 
and the validity coefficient for each item, 
these values were rank ordered. Lastly, the multi= 
ple correlation of the predictors with the overall 
judgment was determined for each study. 


RESULTS 


Table 1 presents the findings at the Un 
versity of Washington, and Table 2 gi 
the results at the University of Pittsburg 

It is not surprising that the number of 
factors differs in the two studies since the 
item-sampling procedures were not the 
same. The high multiple correlations shown 
in Tables 1 and 2 (R = .944 and .976, re- 


be predicted with very considerable accus 
racy. 

In answering the question, “Were all rele 
vant variables included in the set of ite 
chosen?”, it can be seen that since the 
multiple correlations are very close to the 
maximum obtainable, all of the reliab 
variance was accounted for. Thus, no varia: 
ble of importance in predicting student 
overall judgment of teaching effectiveness 
was left unmeasured in either study. Other 
sets of items, differently worded, might pre 
diet overall teaching effectiveness equa 
well, but they would not include variabl 
different from those measured here. 

With this established, the major proble 
of these studies, that is, the relative im) 
tance of the predictors, can be examined by 
inspection of the rank ordering of items i 
Tables 1 and 2. (The number in parentheses 
following each item indicates rank order. 
Comparison of the high-ranking items Ol 
the two lists gives a picture of surprisin| 
consistency across a 15-year time span 
two quite different campuses with diffe 
student and faculty populations. Even more 
interesting is the fact that 5 of the item! 
taken from Guthrie’s (1954) list for the 
University of Washington study (knowl 
edge of subject, arousal of student interest, 


examples and illustrations, clarity of expla 
nations) are in the top 10 of Table 1, aní 
the 4 items from his study used in the Uni: 


^. 
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versity of Pittsburgh study (clarity, 
arousal of student interest, use of examples, 
knowledge of subject) are the top 4 of Ta- 
ble 2. These investigations span a period of 
over 40 years and represent different genera- 
tions of students. It is apparent that some 
of the criteria used by college students in 
judging their teachers are surprisingly re- 
sistant to change. 


Four items on the University of Wash- ` 


ington list suggested by faculty (new view- 
points or appreciations, broadening of in- 
terest, increased skills in thinking, new 
tools for attacking problems) ranked 1, 3, 8, 
and 12, respectively. The University of 


_ Pittsburgh study used only two of these 


items (new viewpoints or appreciations, 
new tools) and they ranked 6 and 13. There 
is, then, evidence of some degree of agree- 
ment between students and faculty as to 
what constitutes good teaching. 

The content of the 10 top-ranking items 
on the two lists is nearly identical. Every 
one of the top 10 items on the University of 
Pittsburgh list is either a duplicate, a modi- 
fication, or from the same cluster (Table 1) 
as the top 10 uf the University of Washing- 
ton study. These verbal statements relate to 
clarity, organization, teacher’s knowledge, 
use of examples, arousal or broadening of 
interests, and motivation or stimulation of 
thought. 

It is informative to note the position in 
Table 1 of statements dealing with the han- 
dling of examinations. In both studies, these 
items carry little or no weight in predicting 
teaching effectiveness. Nor is this criterion 
predicted by such qualities as neatness of 
appearance, friendliness, sympathetic man- 
ner, sense of humor, or the giving of indi- 
vidual attention. Items like these appear on 
many scales that are used to ascertain stu- 
dent opinion of teaching. The evidence here 
suggests their presence can mislead instruc- 
tors into believing that improvements of 
these characteristics would be followed by 
higher general evaluations. 


Discussion 


Before interpreting the results, it is im- 
portant to ask, could the high multiple cor- 
relations obtained simply reflect halo ef- 
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fect? The tendency of raters to respond 
similarly to all items on the basis of a 
global impression results in high item inter- 
correlations that reflect the presence of a 
general factor. Hodgson (1958) applied ro- 
tational models to the University of Wash- 
ington factor-loading matrix and found 
that a solution involving one general factor 
and eight primary factors gave the best ap- 
proximation to simple structure. The gen- 
eral factor accounted for approximately 5% 
of the variance, and Hodgson postulated 
that it represented halo effects. Thus, if 
halo or response style variables account for 
the general factor, their effect is so small 
that it cannot be regarded as the basis of the 
high multiple correlations obtained. (Addi- 
tional evidence on this point is presented in 
the Discussion section.) 

Interpretation of the University of Wash- 
ington and University of Pittsburgh results 
can be facilitated by focusing, in both stud- 
ies, on the content of the items that are 
most predictive of students’ overall judg- 
ments of teaching effectiveness. The content 
of the top 10 items on the two lists can be 
structured into larger categories. They 
could, for instance, be considered under the 
following three categories: clarity of expo- 
sition, arousal or broadening of interest, 
and motivation of intellectual activities or 
stimulation of thought. 


Clarity of exposition: 
Interprets difficult or abstract ideas 
clearly 
Makes good use of examples and illustra- 
tions 
Has presented the course in an organized 
manner 
Inspires confidence in his/her knowledge 
of the subject 


Arousal of student interest: 
Increased my interest in the subject 
Gave me new viewpoints or appreciations 
Broadened my interests 
Included worthwhile material not dupli- 
cated in the text 
Has presented many thought-provoking 
ideas 
Shows interest and enthusiasm in his sub- 
ject 
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Presented course so that you perceive its 
relevance to your interests 


Motivation or stimulation of thinking: 
Stimulated thinking on the part of stu- 
dents 
Inereased my skills in thinking 
Motivated me to do my best work 


Some of the items grouped with clarity 
suggest means by which clarity can be 
achieved—organization of content, use of 
examples, conveying the teacher’s knowl- 
edge. There are, of course, many other con- 
tributors to clear exposition. 

It should be remembered that all of the 
students in the University of Pittsburgh 
study and most of those in the University 
of Washington study were undergraduates. 
At this level of academic maturity, students 
depend on their teachers for some degree of 
motivation in intellectual activities and for 
their involvement in the subject content. If 
these investigations were repeated on grad- 
uate students, who are presumed to be more 
self-motivated and to bring with them a 
developed interest in their field, the findings 
might be quite different, Similarly, other 
differences could be expected for evening 
and summer school students. 

If the content categories have been cor- 
rectly identified, then it should be possible, 
using only 1 item from each category, to 
approach the multiple correlation obtained 
with the full set of predictors, To test this 
Possibility, a reanalysis was performed on 
the University of Washington data using 
the following items: (a) Interprets abstract 
ideas and theories clearly; (b) Gets stu- 
dents interested in the subject: and (c) Has 
increased my skills in thinking. The multi- 
ple correlation for these 3 items with the 
criterion is .928, while that for the set of 41 
predictors is .944. In the University of 
Pittsburgh study, the 3 items used in a 
reanalysis were (a) Interprets difficult or 
abstract ideas clearly; (b) Has increased 
my interest in the subject; and (c) Has 
stimulated thinking on the part of students; 
for these 3 items, a multiple correlation of 
:965 resulted, This compares with a multi- 
ple correlation of -976 for the set of 16 pre- 
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edge of subject, organization of coni 


dictors, Further evidence to support Hod, 
son's (1958) conclusion that halo effect 
were small can be offered here. If the r 
of halo accounted for these high multi 
correlations, then any subset of 3 i 
should give equally high multiple co; 
tions. The multiple correlation for 3 item 
from Table 1 whose regression coefficient 
show that they carry little or no weight i 
predicting the criterion (willing to give in- 
dividual attention, avoids sarcasm, and re 
spects students’ opinions) is .421. This maj 
be taken as an estimate of the maximum 
halo effect, though its true effect is proba: 
bly less. Thus, the difference between 
and .93 (or about 70% of the variance ol 
the general estimate of teaching effective 
ness) cannot be attributed to halo effect. 

These analyses offer strong evidence thal 
clarity, the arousal of interest, and the mo: 
tivation to intellectual activity are the ma 
jor determiners of students’ general evalua 
tion of teaching effectiveness. (It should b 
emphasized that the purpose of these anal; y 
ses was to validate the categories, not t 
suggest that only three items should be usé 
on a rating scale. Ideally, several item 
from each category should be used.) 

The categories described are not meant to 
imply independence. On the contrary, they 
would seem to be complexly intertwined: 
Activities of the teacher which contribute to 
one of these objectives may simultaneous 
achieve one of the others. In using examples 
and illustrations to clarify a concept, the 
effective teacher chooses examples to add 
interest or to help students perceive rel 
tionships to their own experience. Clarifi 
tion can also contribute to new viewpoint 
or appreciations. It is possible that this in 


The University of Washington and Uni- 
versity of Pittsburgh results show 
gruencies with certain other findings. Where 
investigators asked what students valued 
teaching, clarity or the qualities and acti 
ties which contribute to it (teacher’s know 
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> E almost always appears near 


the top of the lists (Crawford & Bradshaw, 
1968; Downie, 1952; Gadzella, 1968; Guth- 
rie, 1954; Musella & Rusch, 1968; Smith, 
1944). Stimulation of thought appears in a 
few (Musella & Rusch, 1968; Smith, 1944). 
Except in Guthrie’s studies, the arousal of 
student interest is not mentioned, but in- 
stead, the teacher’s interest in the subject is 
listed (Downie, 1952; Gadzella, 1968). 

In comparing the University of Pitts- 
burgh and University of Washington results 
with the few factor-analytic studies in 
which factor relationship to overall teach- 
ing effectiveness has been examined (Coff- 
man, 1954; Hildebrand et al., 1971; Isaac- 
son et al., 1964; Solomon, 1966), the great- 
est agreement is with the findings of Coff- 
man and Isaacson et al. Two of the three 
factors Coffman found to be related to gen- 
eral estimate of teaching effectiveness can 
be interpreted as clarity and as the arousal 
of student interest, though he chose other 
names. (His third factor is not well defined 
and leaves some question regarding his 
interpretation of it as “verbal fluency.”) In 
addition, his evidence that personal appear- 
ance and punctuality in meeting classes are 
not related to students’ general estimate is 
borne out in the low rank ordering of these 
items in the University of Washington 
study. 

Isaacson et al. (1964) indicate that the 
first of their six factors correlates most 
highly with “all-around teaching ability.” 
The items describing this factor are as fol- 
lows: (a) Material is put across in an in- 
teresting way; (b) The intellectual curios- 
ity of the students is stimulated; (c) 
Things are explained clearly; and (d) The 
teacher is skillful in observing student reac- 
tions. The last item, (d), may be a contrib- 
utor to clarity, since the teacher who is 
sensitive to class reactions may perceive 
that an explanation is not being understood 
and may, consequently, alter his approach 
or go over the material again. 

There is obvious agreement with the fac- 
tor described by Hildebrand et al. (1971) 
as “organization/clarity” and possible over- 
lap with another labeled “dynamism/en- 
thusiasm,” though this latter conclusion 


383 


needs examination. They reported that a 
rating item constructed to measure this fac- 
tor correlates more highly with overall 
teaching effectiveness ratings than do the 
other four factors (organization/clarity, 
analytie/synthetie approach, instructor- 
group interaction, instructor-individual in- 
teraction). The actual wording reported for 
their (Hildebrand et al., 1971) dynamism 
item is as follows: “Enjoys teaching, is en- 
thusiastie about his subject, makes the 
course exciting, and has self-confidence [p. 
23]." 

It is interesting to note that the phrase 
“makes the course exciting" was not among 
the items factor analyzed (Hildebrand et 
al, 1971, p. 8, Table 3), or in the factor 
results (p. 18, Table 5). In this author's 
opinion, *making the course exciting" does 
not represent the factor "dynamism" but 
rather is related to making the course inter- 
esting. Certainly it represents something 
achieved by the teacher, not a personality 
dimension. It is entirely possible that the 
insertion of this phrase in an item designed 
to measure dynamism accounts for its high 
correlation with overall effectiveness. 

Comparison with Solomon's (1966) re- 
sults is not appropriate for a number of 
reasons. His rating categories were based on 
quantity or frequency (never, rarely, some- 
times, frequently, always) rather than com- 
parative quality; many of his items are 
sharply different in nature from those used 
in most studies (Percentage of time teacher 
speaks; Amount of teacher factual speech; 
Number of unsolicited student, comments), 
thus, his factors are not very similar to 
those found by others; and finally, his sam- 
ple consists of adult, evening school classes 
from five different schools treated as one 
population. 

A major difference between the results of 
other investigators and those of the Univer- 
sity of Washington and University of Pitts- 
burgh studies is in the weight given to per- 
sonality dimensions in the teacher. Asking 
students to describe good teachers resulted 
in frequent mentions of qualities such as 
sympathetic, friendly, enthusiastic, and en- 
ergetic. Friendliness and sympathetic man- 
ner are low in the University of Washington 
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ranking, while the item “Shows interest and 
enthusiasm in his subject" ranked ninth. 
Such personality qualities were omitted in 
the University of Pittsburgh study. The 
high multiple correlations in the University 
of Pittsburgh results and in those of Langen 
and Sorenson (1963) show that personality 
characteristics are not necessary in the pre- 
diction of student overall ratings of teach- 
ing. Enthusiasm on the part of an instructor 
may contribute to arousing students' inter- 
est in the subject, but there are obviously 
other ways in which the same goal can be 
achieved. 

When asked what they consider the im- 
portant characteristics of a good teacher, 
students may tend to generalize from those 
qualities that they have valued in other 
personal relationships to interactions with 
their college teachers. However, in evaluat- 
ing their instructors, students give high 
overall ratings to some teachers who are not 
judged to be dynamic, enthusiastic, sympa- 
thetic, or friendly. They may be quiet, 
scholarly, and a little aloof, or they may 
have many other combinations of personal- 
ity traits. (This is not to say that personal- 
ity factors do not play a part in the meth- 
ods a teacher chooses or is able to use suc- 
cessfully.) 

From the findings of the University of 
Pittsburgh and University of Washington 
studies, it can be concluded that the kind of 
teaching evaluated most highly by students 
is teaching which they judge to be clear in 
exposition, which arouses or broadens their 
interests, and which motivates or stimulates 
them to intellectual activity. These are 
broad objectives that can be achieved 
through the use of different teaching meth- 

ods, in classes of different sizes, and by 
teachers with different personalities, Such 
objectives are far from easy to attain and 
are perhaps more difficult to achieve in 
some subject areas than in others. But be- 
cause they represent changes in the learner 
they can become goals to be added to tione 
of the instructor for his or her students. 
Awareness of what components of teaching 
students weight most highly should improve 
our interpretation of their evaluations and 
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provide direction for the construction 
rating instruments. 
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EFFECTS OF TEACHER SEX AND STUDENT SEX ON THE 
EVALUATION OF COLLEGE INSTRUCTORS 


PATRICIA B. ELMORE' ax» KAREN A. LAPOINTE 


Southern Illinois University at Carbondale 


This study assessed the influence of faculty sex and student sex in 
teacher evaluation. Twenty questions from the Instructional Improve- 
ment Questionnaire which directly evaluate instructor performance 
were analyzed using a two-factor analysis of variance. No interactions 
between faculty sex and student sex were found. Generally, there 
were no differences between the mean ratings given male and female 
faculty by male and female students. However, male instructors did 
receive higher ratings on “spoke understandably,” while female instruc- 
tors received higher ratings on “promptly returned homework and 
tests.” In addition, female students rated instructors higher on “speci- 


fied objectives of the course.” 


For at least a decade, a question of prac- 
tical importance to university faculty, ad- 
ministrators, and students as well as parents 
and state legislators has been, How can col- 
lege teaching be evaluated? An extensive 
survey conducted by Gustad (1961) re- 
vealed that student ratings were most fre- 
quently mentioned as the method of teacher 
evaluation used by almost 600 colleges and 
universities. 

Once a method or procedure was deter- 
mined to evaluate college teachers, another 
question was posed: Do particular charac- 
teristics of the instructor or the students 
affect the way the instructor is evaluated 
by the students in his class? Such character- 
istics as the sex of the faculty member being 
evaluated and the sex of the student doing 
the evaluation may affect the ratings given. 

Studies which have attempted to assess 
the influence of faculty sex on evaluations 
have found conflicting results. Two investi- 
gations (Elliott, 1950; Lovell & Haner. 
1955) found no significant differences n 
tween male and female faculty. 

In studies where differences have been 
found, investigators are reluctant to place 
much emphasis upon them. Spencer (1969) 

. Requests for reprints should be sent to Pa- 
tricia B. Elmore, Testing Center, Washington 


Square, Building C, Southern Ilinois Universi 
Carbondale, Tao S200 3 ern Illinois University, 
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reported that at the University of Illinoi 
there is a tendency for women to receiv 
higher ratings. He concluded, however, th 
there was no meaningful relationship be 
tween sex and most Illinois Course Evalut 
tion Questionnaire items. Similarly, Heil 
man and Armentrout (1936) found highe 
mean ratings for women than for men, bul 
the differences were not statistically signifi: 
eant. In another study, Downie (1952) 
found women instructors received highel 
ratings than men in the extent to which thej 
brought new books and authors into 
classroom. 

Results from studies examining the effect 
of student sex are as conflicting as the stud- 
ies related to faculty sex. Two studies 
(Goodhartz, 1948; Isaacson et al., 1964) 
found no differences between faculty ratings 


Purdue Rating Scale for Instruction fount 
two factors: instructor competence and in 
structor empathy. There were no significan 
differences due to student sex when both fac 
tors were considered together. When instru! 
tor competence was considered alone, ther 
was a significant sex difference but the direc- 
tion of the difference was inconsistent. 

_A couple of studies have found specific 
differences between men and women sti 
dents. Bendig (1952) found that women stu 
dents rated their instructors (men) signif 


SEX EFFECTS ON INSTRUCTOR EVALUATION 


cantly lower than the male students rated 
them, whereas Elliott (1950) found that 
women students tended to give higher rat- 
ings in "presentation of the subject matter" 
than male students. 

Recently, studies involving faculty sex, 
student sex, and characteristies of the in- 
structor, course, or teaching method have 
found significant sex differences. Carney and 
McKeachie (1966) found that women rate 
life-oriented topies in psychology signifi- 
cantly higher and science-oriented topies in 
psychology significantly lower than men. 

In addition, MeKeachie, Lin, and Mann 
(1971) reported that in four out of five 
studies relating student sex, achievement 
test results, and factors obtained from stu- 
dent ratings, instructors who received higher 
ratings in structure and feedback were more 
effective with women than with men stu- 
dents. 

The purpose of the present study was to 
determine if women faculty received signifi- 
cantly different ratings than men faculty 
and if this was related to the sex of the stu- 
dent doing the rating. 


METHOD 


Students at Southern Illinois University at Car- 
bondale evaluated 1,474 courses during 1971. From 
this data base, courses were matched on the basis 
of course number and sex of instructor. A matched 
sample of 38 pairs of courses evaluated by 1,607 
students was obtained. Of these 1,607 students, 
complete data was available for 1,259 students. 
The courses included in this study represent varied 
departments and colleges on the Carbondale cam- 
pus. 
The questionnaire that was administered to the 
students was the Instructional Improvement Ques- 
tionnaire? This questionnaire consisted of five 
parts: Part I, instructor evaluation (Items 1 
through 20); Part II, evaluation of course (Items 
21 through 40); Part III, strengths and weaknesses 
(Items 41 through 60); Part IV, research. data 
(Items 61 through 72); and Part V, optional items 
(Items 73-100). The responses to Part I, which 
required the students to evaluate different aspects 
of the instructor’s performance on & five-category 
scale, were used for analysis (see Figure 1). 


RESULTS 


The students’ responses to each of the 
twenty items were submitted to a two-factor 


*Copies are available upon request from the 
Senior author. 
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(Instruetor Sex x Student Sex) analysis of 
variance. Due to the extreme negative skew- 
ness of the data, the level of significance 
chosen was .01. 

The results of the analysis of variance re- 
vealed no statistically significant interac- 
tion between faculty sex and student sex on 
any of the 20 items. 

Only two significant differences were found 
between male and female faculty. Male in- 
structors received significantly higher mean 
ratings on Item 6, “spoke understandably,” 
than female instructors, while women re- 
ceived significantly higher ratings on Item 
15, “promptly returned homework and 
tests.” 

Only one significant difference between 
male and female students emerged. Female 
students rated instructors higher on Item 
13, “specified objectives of the course,” than 
did male students. 


Discussion AND CONCLUSIONS 


The present study found no interaction 
between faculty sex and student sex, and in 
general, there seemed to be few meaningful 
differences between male and female faculty. 
Only two differences, that men spoke under- 
standably and that women more promptly 
returned homework assignments and tests; 
were significant. It may be that men on the 
average project their voices more effectively 
than women or that the tonal quality of 
their voices is more pleasing. Women may 
have more concerns over efficiency than 
men and, hence, pay more attention to re- 
turning tests and homework. Neither of 
these differences, however, seems to be prac- 
tically significant. 

Similarly, there seems to be little differ- 
ence between male and female students. The 
only significant difference found was that 
women students tended to rate their instruc- 
tors higher on specifying objectives of the 
course. Recently, MeKeachie et al. (1971) 
have shown that instructors rated high on 
structure were more effective with women 
than with men students. It may be that 
strueture as reflected in organization is 
more salient and valuable to women than 


men students. 
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PART I: INSTRUCTOR EVALUATION: ITEMS 1 THROUGH 20 


DIRECTIONS: The following twenty phrases relate to college-level teaching. 
Evaluate how your instructor did in each of these aspects of teaching by 


selecting the one response option (A through E below) that comes closest 
to your judgment. 


RESPONSE OPTIONS: OMIT ITEMS THAT DO NOT APPLY 


A. Exceptional or outstanding performance 

B. Very Good performance 

C. ' Good performance, all that I would normally expect in college-level teaching 

D. Weak performance, instructor should be aware of some opportunity for 
improvement 

E. Improvement definitely needed 


1. Prepared for class 

2. Made clear assignments 

3. Set clear standards for grading 

4. Graded fairly 

5. Knew if students understood him 

6. Spoke understandably 

7. Answered impromptu questions satisfactorily 

8.  Showed an interest in the course 

9. Gave several examples to explain complex ideas 
10. Accepted criticism and suggestions 

11. Increased your appreciation for the Subject 
12. Was dependable in holding class as scheduled 
13. Specified objectives of the course 

14. Achieved the specified objectives of the course 
15.  Promptly returned homework and tests 

16.  Showed an interest in students 


17. Made assignments that helped you understand the course 
18. Was available outside of class 


19, Encouraged participation of students 
20. In general, taught the class effectively 


TEE o 


ient Questioned 1, instructor evaluation: Items 1 through 20 on the Instructional Improve- 


The present results were consistent with 
previous findings in which the Majority of 
studies found no significant or practical dif- 
ferences between ratings of male and female 
faculty or between male and female stu- 
dents’ ratings of faculty. 

; Many variables, such as class size, teach- 
ing method (lecture versus discussion), 
course content, rank of instructor, and stu- 
dent class, are uncontrolled in a large 
evaluation study like this. This may tend to 
mask differences which are present. Further 


experimentation in which class size and in- 
structor rank are controlled is being planned. 
Sex differences do seem to occur in studies 
which assess the interrelationships among 
Student sex, faculty sex, and student or in- 
structor characteristics. Further research is 
` needed to clarify the relationship of student 
Sex, faculty sex, and such characteristics a8 
teacher warmth, structure, and feedback. If 
sex differences are to be found, it seems that 
they will be discovered in relationship with 
these other variables. 


SEX EFFECTS ON INSTRUCTOR EVALUATION 
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CHANGING TEACHER AND STUDENT BEHAVIOR: 
AN EMPIRICAL INVESTIGATION' 


THOMAS L. GOOD* 
University of Missouri—Columbia 


Teacher behavior toward two different groups of target students was 
altered by presenting teachers with information about their previous 
interaction with the target children. About 40 hours of pretreatment 
and 40 hours of posttreatment data were collected in first-grade class- 
rooms drawn from three different schools, using the Brophy-Good 
dyadic observation system. Treatment procedures markedly altered 
both quantitative and qualitative aspects of teacher interaction with 
target students, and in addition, the behavior of the target students 
changed reciprocally. Changes in specified teacher behavior toward tar- 
get children were accompanied by many additional changes in un- 
specified teacher behavior toward both target and nontarget children. 
The effects of these behavioral changes on nontarget children were 
generally positive or at least nondetrimental. 


Classroom life is an uneven affair. Some 
students receive much more teacher contact 
than others (Good, 1970; Horn, 1914; Jack- 
son & Lahaderne, 1967; Jones, 1971; 
Kranz, Weber, & Fishell, 1970; Mendoza, 
Good, & Brophy, 1972; Sikes, 1971). Also, 
some students receive qualitatively superior 
teacher treatment (Brophy & Good, 1970b; 
deGroat & Thompson, 1949; Good & Bro- 
phy, 1972; Rist, 1970; Rowe, 1969; Silber- 
man, 1969). As the authors have noted pre- 
viously (Good & Brophy, 1971), when in- 
vestigators have looked for differential 
teacher behavior toward students differing 
in achievement, sex, or socioeconomic sta- 
tus, they have consistently found it. Low- 
achievement students, for example, usually 
receive considerably less opportunity to re- 
spond than high-achievement students. 


* This research was supported in 
MH 17907-01 from the National ie annt 
tal Health, U. S. Public Health Service, Thomas 
L. Good, principal investigator. The authors grate- 
fully acknowledge the coding assistance of Candy 
Chazanow, Carolyn Evertson, Suzi Good, Teresa 
Harris, and Sue Jones and acknowledge the as- 
sistance of Susan Florence, Pat Hollowell, and 
Sherry Kilgore for typing the manuscript. 

? Requests for reprints should be addressed to 
Thomas L. Good, Center for Research in Social 
Behavior, 111 East Stewart Road, University of 
Missouri, Columbia, Missouri 65201. 
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JERE E. BROPHY 


Universiy of Tezas at Austin. 


Thus, different students regularly rece? 
differential treatment from their teache 
and at times such teacher behavior is inaj 
propriate. However, little is known abo 
what feedback procedures help teachers 
change their behavior toward selected st 
dents and how such changes affect both ta 
get and nontarget students. 
Seemingly, the easiest way to chani 
teachers’ behavior would be to make the 
more aware of it through feedback. Hoy 
ever, teachers often reject feedback becau 
they do not agree with the criteria the 8 
pervisor uses (McNeil, 1971). Coni 
quently, teachers usually do not cham 
their teaching along the lines suggested, an 
they sometimes even deliberately move i 
the opposite direction (Tuckman & Olive 
1968). ) 
In response to this difficulty, a model ¥ 
developed for presenting information 
teachers about their behavior in nonthre 
ening ways (Good & Brophy, 1970). O 
purpose of this study was to assess 
applicability of this model. A single int 
view was selected as the treatment, b 
in addition to testing the model, we W 
to design a strategy that could be 
give teachers feedback after observ! 
them. Details on the strategy are provi 
in the Treatment section. A third purpose 


TEACHER AND STUDENT BEHAVIOR 


^ the study was to assess effects on both tar- 


p 


get and nontarget students. If the treatment 
resulted in changed teacher behavior, would 
this lead to reciprocal changes in student 
behavior? Behavior modification studies 
have demonstrated repeatedly that consist- 
ent teacher behaviors lead to predictable 
student responses. However, behavior modi- 
fiers and researchers in general have not 
typically investigated the effects of a treat- 
ment on nontarget students. Most classroom 
behavior modification studies have been 
case studies in which only one or a few 
target students were observed. If the class- 
room ecology is to be disturbed, it is impor- 
tant to assess how changes in teacher be- 
havior affect all students. For example, if 
teachers are requested to have more con- 
tacts with certain students, does this reduce 
their contacts with other students? Con- 
versely, perhaps the call for more frequent 
interaction with certain students has a ra- 
diation influence that results in the teacher 
having more individualized contacts with 
all students. 

Withall (1956) provided information and 
suggestions to teachers on student partici- 
pation rates. The teachers dramatically in- 
creased the participation level of students 
who had been low participators. Further- 
more, a radiation effect was observed: 
teachers’ interaction rates with nontarget 
students also rose significantly. These data 
strongly suggest that treatment effects may 
radiate to nontarget students, and they in- 
dicate the need for more research on unan- 
ticipated treatment effects. The call for 
teachers to behave in qualitatively different 
ways also needs empirical investigation, es- 
pecially concerning the possibility of disor- 
dinal effects. For example, perhaps teachers 
asked to praise more and to criticize less 
when dealing with target students will do 
this with all students, Other instructions 
may have an opposite effect, Teachers 
asked to stay with target students when 
they fail to respond correctly (by probing, 
repeating the question, providing clues) 
may begin to deal more expediently with 
nontarget students (give up easily, call on 
someone else). 

Turner, Foa, and Foa (1971) point out 
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that some resources, like money, are poten- 
tially exhaustible. If one continues to give 
money, it eventually runs out. Other re- 
sources are like love: one can continue to 
give without extinguishing the source. More 
attention is needed to pinpoint teacher be- 
haviors that can be increased toward target 
children without reducing the expression of 
those same behaviors to other students. To 
summarize, the two major purposes of this 
study were as follows: (a) to see if a simple 
feedback strategy derived from a more gen- 
eral model would change teacher behavior 
and (b) to observe the effects of any 
changes in teacher behavior on both target 
and nontarget students. 


TREATMENT RATIONALE 


It was hypothesized that feedback on ob- 
servational data would be more acceptable 
to teachers than the feedback they usually 
get from supervisors. As noted elsewhere 
(Good & Brophy, 1970), supervisor com- 
ments too often boil down to “your way is 
wrong, do it my way,” leading teachers to 
become defensive. The treatment attempted 
to minimize this problem in two ways. 
First, it was based on objective observation 
of a large number of discrete interactions, 
rather than upon brief impressions. In fact, 
the observers in each teacher's classroom 
were not involved in the treatment, so that 
advice would be based strictly on objective 
data. Interviews were conducted by the au- 
thors, working from data sheets turned in 
by the observers. This stress on objective 
data was designed to reinforce the accepta- 
bility and credibility of the suggestions 
given to teachers. 

A second advantage resulted from the use 
of the individual student as the focus of 
analysis. By tabulating the teacher’s inter- 
action with each different student sepa- 
rately, it was possible to show that teachers 
were teaching some students appropriately 
and others inappropriately. In making sug- 
gestions for improvement, the authors were 
in effect saying “you are doing a fine job 
with Mary, now try to do the same kinds of 
things with Jane.” This is much less threat- 
ening and more acceptable than “your wav 
is wrong, do it my way.” 
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METHOD 


The treatment was confined to a single inter- 
view to see if teachers could change in response to 
feedback alone, without any retraining or continu- 
ing supervision. Earlier work had convinced the 
investigators that much inappropriate treatment 
of students occurs partly because teachers are not 
aware of what they are doing. Few teachers 
deliberately and consciously give up on a student 
or treat him with rejection and discouragement 
rather than with patience and encouragement. In- 
stead, they gradually drift into a pattern in which 
they are reacting to certain students inappro- 
priately without realizing it. If this is true, it fol- 
lows that a treatment which simply provides 
teachers with information to make them aware of 
what they are doing should be enough to make 
them able to change. 


Treatment Groups 


The treatment was focused on two types of 
students in each classroom. The first (low-partici- 


pant group) were students with low rates of inter-, 


action with teachers. These students seldom volun- 
teered to answer questions or initiated interactions. 
Correspondingly, the teachers did not seek them 
out very frequently. Thus, the low-participant 
group had low rates of interaction with teachers, 
both because they avoided the teachers and be- 
cause the teachers avoided them. 

A second group (the extension group) was 
identified on the basis of their teacher's willing- 
ness to provide them with a second response op- 
portunity when they failed to succeed on the first. 
"Teachers usually did not persist in seeking re- 
sponses from extension students if they failed on 
their. first opportunity. Instead, the teachers would 
give them the answer or call on someone else. 
These students were labeled the "extension" group 
because the advice given to teachers was to extend 
their interactions with them in failure situations. 

Each classroom showed extreme variability on 
these measures, so that low-participant and ex- 
tension students could be identified in each class. 
There were also general differences from one 
teacher to the next. For example, one teacher 
stayed with her students over 50% of the time 
following an initial failure, while another stayed 
less than 20% of the time. However, within each 
class, there were some students with very high 
percentages and others with very low ones. The 
situation with low participants was similar. There 
were large differences between classrooms but 
even larger differences within classrooms, so that 
a group with strikingly low interaction frequenci 
could be identified in each class, ay 

Tn each classroom at least two students, and 
more typically three or four, were identified for 
each treatment group. The number was dependent 
upon the data. Tf only two students stood out as 
having strikingly low rates of interaction with the 
teacher, these students alone comprised the low- 
participant group. In another class where four 
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students had strikingly low rates, the low-p; * 
pant group would contain all four. This fle 


a group was appropriately classified and that 
suggested treatment was, therefore, appro; 
for him. 

A total of 21 low participants and 28 e 
students were identified in the eight classro 
The low-participant group included 14 girls 
7 boys, while the extension group included 8 
and 20 boys. The predominance of girls in the 
participant group was expected, since girls typi 
have fewer interactions with teachers than 
and are generally less salient in the class 
(Brophy & Good, 1974). 

Perhaps the extension group contained m 
boys than girls because of the sex difference fa 
ing girls in achievement at this grade level. To 
extent that teachers’ rates of staying with stu 
dents are affected by the probability that the set 
ond question will be answered successfully, 
sex difference in achievement may have b 
factor. Both groups included students from 
achievement levels. 


Treatment Interview 


The data were analyzed between semesters, 
interviews were scheduled early in the seco 
semester, Each teacher was seen individually. 
each interview, a list had been prepared contal 
ing the names of four groups of students. On tb 
left side were the names of the low particip 
and, under them, the names of the extensio 
students, On the right side, across from each 
spective treatment group, were the names of t 
contrast groups. 

The contrast groups were students that tl 
teachers were treating appropriately in situati 
where they were treating the treatment g 
inappropriately. The contrast children were | 
cluded for two reasons. The first was to stim 
each teacher’s thinking and help her to “see” 
own behavior more clearly. Hopefully, the 
portunity to compare each treatment group 
its contrast group would help the teacher gain in 
sight into the reasons for her differential 1 
ment of these two groups. The other reason fé 
including contrast groups was our interest - 
providing acceptable and nonthreatening feedback 
The contrast groups made it possible to point 
students with whom the teacher had been di 
especially good job. Further, in suggesting 
it was possible to say “you are already d 
with these contrast students; now try to 
with these other students, too." 

Teachers were first shown the list conta 
the names of the four groups and asked to 
to it. They were puzzled by this request, 
most made vague responses. A couple indi 
that the low participants included "some 
shyest students in the class.” Otherwise, 
teachers were unable to indicate common ch 
teristics within groups or common differences. 
tween contrast groups. 
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.The teachers were then told how the groups 
had been selected and how they had been behaving 
differentially toward them. About half of the 
teachers were aware of their low rates of inter- 
action with some low participants. Their responses 
included *Deborah doesn't need help and doesn't 
want to be bothered by anyone. She is very inde- 
pendent"; "Carla is very shy. She is extremely 
embarrassed when I call on her and she can't 
answer. I don't want to embarrass her" When a 
teacher was aware of her low rate of interaction 
with a student, the student was typically a shy one 
that the teacher was afraid of embarrassing, al- 
though a few “independent” students were included 
here also. The low-participant group also con- 
tained “unknowns” in addition to shy and inde- 
pendent students. Teachers usually were not aware 
of their rates of interaction with such students, nor 
particularly aware of these students generally. 
They responded to questions about them with 
statements like “you know Sam there ...I guess I 
don’t interact much with him. I don’t know him 
very well.” 

Attempts to get the teachers to compare the 
low-participant group with its contrast group 
yielded no clear differences, although a few trends 
were evident. Contrast students were more likely 
to be described as independent than as shy, and 
they were not seen as likely to become embar- 
rassed or upset if they could not answer a ques- 
tion. Thus, they contrasted well with the shy 
students in the low-participant group but not with 
the “independents” or the “unknowns.” 

After discussion, the teachers were asked to in- 
crease their interactions with low participants by 
calling on them more frequently and initiating 
more private interactions with them. The teachers 
readily agreed to this with regard to the “un- 
knowns,” but many had reservations when it came 
to the independent students (they don’t need it— 
leave well enough alone) and especially to the shy 
students (they will become embarrassed and up- 
set). 

One response here was that the teachers’ con- 
cerns were understandable but that these students’ 
willingness to respond in public situations could 
be improved without endangering their security. 
It was suggested that inhibitions can be eliminated 
only with practice and exposure to public response 
opportunities, especially through low-key oppor- 
tunities that result in successful student response. 
Teachers were encouraged not only to seek out the 
low participants more often for private contacts 
but also to deliberately call on them to increase 
their public response opportunities. We also re- 
minded the teachers that they did not have to 
force a student to respond if he did start to be- 
come flustered; they could help him by giving 
the answer or providing a clue. After varying 
amounts of discussion, the teachers agreed to try 
to increase private and public contacts with low 
participants, although & few teachers had reserva- 
tions about a few students. The latter were mostly 
high-achieving, independent types, and the teach- 
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ers felt (with some justification) that their con- 
tact patterns were already optimal. 

Similar treatment procedures were followed for 
the extension group. The teachers could identify 
no common characteristic of the extension stu- 
dents and no common differences between them 
and their contrast group. Also, none of the teach- 
ers had any awareness of their tendency to give 
up easily with the extension group or to be es- 
pecially persistent with the contrast group. Each 
teacher was surprised and mystified by the data 
on this variable. A few said things like “that child 
makes me nervous,” about individual extension 
students, but they were unable to name any 
attribute that was common to all or most of them, 
Apparently, unknown individual differences in the 
students condition the teacher into quite predict- 
able behavior patterns, but this is done without 
any awareness on her part. 

‘All the teachers readily agreed to become more 
persistent in seeking responses from the extension 
students, either by repeating the question or by 
providing help in the form of a clue or a new 
question, Teachers expressed more interest in 
changing this behavior (perhaps because they were 
so totally unaware of this aspect of their behavior) 
than they had in changing their behavior toward 
low participants. 


Data Collection 


Data were collected in eight out of nine first- 
grade classrooms that were already involved in a 
larger study of the relationships between teachers’ 
performance expectations and their behavior to- 
ward different children (Brophy & Good, 1974). 
Originally, there were three classes studied in each 
of three types of schools: upper-middle-class white, 
lower-class white, and lower-class black. However, 
one teacher at the upper-middle-class, white school 
moved out of town after the first semester, Teach- 
ers were told that the investigators were interested 
in observing differences in the classroom behavior 
of children who varied in achievement. In late 
September, each teacher supplied a list ranking her 
children in order according to the levels of achieve- 
ment she expected from them, Teachers were also 
asked to rank their students at two other times in 


the year. ; 
Sixteen 2V& hour observations were made in 
each classroom with the Brophy-Good dyadic in- 
teraction system (Brophy & Good, 1970a). The 
resulting data pool provided information on the 
teacher-child interaction patterns of 259 children, 
based on 40 hours of classroom observation taken 
on 16 different days during à 3-month period. After 
baseline data were collected, the treatment was ad- 
ministered and then an additional 3 months of 
behavioral data were collected. ! 
During September, pairs of observers worked in 
each classroom to establish reliability (80% agree- 
ment) and to desensitize the teachers and children 
to their presence. After reliability was established 
(procedures are detailed in Brophy & Good, 
19703), the observers began to work singly. Ob- 
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servers did not see the teachers’ achievement 
rankings and did not know which students had been 
singled out for special study. 


Data Analysis 


The raw data were first converted into measures 
designed to control for absences and to allow direct 
comparison among children in the same room. 
Frequency counts were converted to means, divid- 
ing each child’s totals by the number of observa- 
tions for which he was present. Other measures 
were percentage scores compiled according to the 
procedures detailed in Brophy and Good (1970a). 

The data for each class were then standardized 
(M = 0, SD = 1) to set the eight classes on a 
common scale and to eliminate variance due to 
teacher or class differences. Analyses of variance 
were then obtained from these standardized score 
distributions to test for significant changes in 
teacher interaction with target students. First, 
two series of one-way analyses of variance over 
repeated measures (pre- and posttreatment) were 
performed, in which the means on the standardized 
teacher-student interaction measures for each tar- 
get group, respectively, were compared with the 
means for all other children. Thus, these analyses 
compared the scores of 21 or 28 children with those 
of the remaining 238 or 231, respectively. In addi- 
tion, repeated measures analyses of variance were 
performed separately for each classroom, using 
nonstandardized means and percentage scores. 
These data reveal the extent to which treatment 
effects generalized across teachers and radiated to 
include nontarget children. 


RzsurTS 


Achievement Expectations 


To see whether the treatment affected the 
teachers’ achievement expectations for tar- 
get students, teachers’ rankings collected in 
mid-November were compared with the set 
collected in March, after the treatment in- 
terview. These comparisons showed very 
little change in the rankings. Thus, despite 
the considerable changes in teacher-child 
interaction patterns to be described in the 
next section, the treatment did not affect 
the teachers’ achievement expectations for 
the target students. 


Teacher-Child Interaction Effects 
Involving Target Groups 


Comparison of teacher-child interactions 
involving target students before and after 
treatment showed clear-cut treatment ef- 
fects for both target groups. The data are 
summarized in Table 1. 
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Extension Group 


The most notable changes in teacher be- 
havior with the extension group occurred on 
the measure of staying with students fol- 
lowing an initial failure. Despite a marked 
tendency to give up easily with these stu- 
dents in the first semester, in the second 
semester the teachers stayed with these stu- 
dents just as often as they did with their 
classmates. Thus, the treatment brought the 
extension students from a state of clear dis- 
advantage to a state of parity with their 
classmates. 

There were also other changes in the 
teachers’ treatment of extension students. 
The teachers called on them more fre- 
quently and initiated more contacts with 
them during the second semester, and they 
praised them more frequently than they 
had in the first semester. In addition to the 
increased praise, an improvement in the ra- 
tio of behavioral warnings to behavioral 
criticisms also suggests an improvement in 
the teachers’ attitudes toward extension 
students. In the first semester, the teachers 
were prone to respond to misbehavior by 
extension students with criticism. In the 
second semester, they more often merely 
warned the students. However, even after 
treatment, extension students were more 
likely than their classmates to be criticized 
for failures to respond or for giving wrong 
answers. 

The change in teacher behavior toward 
extension students produced no notable 
changes in the behavior of the extension 
students toward the teachers. The only sig- 
nificant change was on the measure of 
wrong answers over wrong answers plus no 
response. The extension students had been 
higher on this measure prior to treatment, 
but they were equal to their classmates 
after treatment. This may reflect a decrease 
in the extension students’ willingness tO 
take a guess. However, given the great in- | 
crease in the teachers’ frequencies of stay- 
ing with these students, the change proba- 
bly occurred primarily because the teachers 
began asking them more questions that 
they could not answer. On measures of per- 
centage of correct answers and frequencies 
of reading errors, the extension students . 
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were similar to their classmates during both 
semesters, The same is true for their fre- 
quency of behavior contacts with the 
teacher; this was the case even though most 
of these students were boys and despite the 
higher than average teacher criticism of 
this group. Apparently there was something 
about these students that made their teach- 
ers hypercritical of them, but it was not a 
higher frequency of misbehavior on their 
part. 

In summary, the teachers were able to 
significantly increase their rates of staying 
with extension students in failure situa- 
tions. Following treatment, the teachers 
stayed with these students just as often as 
they did with their classmates. They also 
sought them out for contacts, praised them 
more often, and were less prone to criticize 
them. However, extension students still re- 
ceived more criticism when they failed to 
respond correctly (but not more behavior 
criticism). There were no notable effects of 
the treatment upon the behavior of the ex- 
tension students themselves. 


Low-Participation Group 


The treatment was also successful in 
changing the teachers’ treatment of the low 
participants. Seven of eight measures of 
quantity of teacher-student contacts in- 
creased with the low-participant group; 
most of these, significantly. On some meas- 
ures, the low participants achieved parity 
with their classmates, while on others they 
still remained below average. In other 
words, the teachers significantly increased 
their interactions with low participants fol- 
lowing the treatment, but this group still 
had fewer contacts with the teachers than 
did their classmates. 

This activity of the teachers produced 
some complementary behavior on the part 
of the low-participant students. There was 
some tendency for them to seek response 
opportunities and contacts with teachers 
more frequently after the treatment. De- 
spite this improvement, however, they re- 
mained generally below their classmates on 
these measures. 

There were dramatic changes in the rates 
of teacher praise and criticism of low par- 
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ticipants. During the first semester, they 
had been praised more frequently than their 
classmates for good behavior but not for 
successfully responding to questions or 
completing seatwork. The posttreatment 
data showed dramatie rises in teacher 
praise for correct answers and good seat- 
work. There was also a corresponding drop 
on most measures of teacher criticism. 
These differences are probably related to 
the teachers’ concern about pushing these 
students too hard and about making sure to 
encourage them when they showed signs of 
progress. It should be noted, however, that 
it took the treatment to bring out this con- 
cern: during the first semester, the low par- 
ticipants earned special praise only for 
their behavior (conforming to classroom 
rules) but not for their success in answering 
questions or completing seatwork. Thus, the 
treatment focused teacher attention more 
closely on the academic performance of these 
students. 

The data for the difficulty level of teacher 
questions directed to the low participants 
showed mixed results. In general, the teach- 
ers asked them more difficult questions fol- 
lowing the treatment. Thus, even though 
most of the teachers expressed fear of em- 
barrassing them, this concern did not cause 
them to ask the low participants only easy 
questions during the second semester. The 
measures of teacher feedback also showed 
significant and beneficial changes. During 
the first semester, teachers frequently failed 
to give feedback to the low participants 
after responses, both during reading groups 
and in general class discussions. Also, 
teachers less frequently gave these students 
process feedback when they checked their 
seatwork. These differences disappeared fol- 
lowing the treatment, again suggesting that 
the teachers began to pay more careful at- 
tention to the academic responses of the low 
participants. 

To summarize the teacher effects, the 
treatment produced more widespread . 
changes in the behavior of teachers toward 
the low participants than in their behavior 
toward the extension students. The teachers 
not only called on these students more often 
and initiated more individual contacts with 
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them, they also asked them somewhat more 
difficult questions, praised them more fre- 
quently, criticized them less frequently, 
gave them more process feedback, and less 
often failed to give them feedback. Thus, 
the treatment seemed to make the teachers 
generally more aware of and concerned 
about the achievement of the low partici- 
pants. 

There were also some effects on the stu- 
dents. Most notably, the low participants 
showed a drop in their percentage of correct, 
answers, It is difficult to say whether this 
change was good, bad, or indifferent. The 
changes were general but rather small, es- 
pecially in reading groups where most of 
the response opportunities occurred, During 
reading groups, low participants answered 
75% correctly the first semester and 64% 
the second. Meanwhile, their classmates 
dropped from 75% to 70%. The drop in 
percentage correct during general class dis- 
cussions, however, was much greater: low 
participants dropped from 71% to 55%, 
while their classmates only dropped from 
70% to 67%. Thus, one result of the teach- 
ers’ asking low participants more frequent 
and somewhat more difficult questions was 
that their percentage of correct responses 
dropped. Except for errorless learning advo- 
cates, most observers would see changes of 
this magnitude as positive, indicating 
greater teacher efforts to get the most out of 
these students (assuming, of course, a gen- 
erally supportive atmosphere). 

The posttreatment means for two class- 
rooms dipped to 38% and 31%, respectively, 
on the measure of percentage of correct an- 
swers during general class discussions. In 
these two instances, the treatment appar- 
ently caused the teachers to push too hard. 
However, the means for the other six classes 
on this measure and the means for all eight 
classes on the measure for correct answers 
in the reading group were in the 50% to 
80% range. Thus, although one effect of the 
treatment was to lower the percentage of 
correct responses made by low participants, 
their posttreatment percentages were still 
comparable to those of their classmates 
and, judging by the increases in rates of 
initiation of interactions with the teachers 
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shown by these students, this did not cause 4 
fear or embarrassment. 

Other changes in student measures were 
more clearly positive. Although the changes 
were not nearly so dramatic as those in the 
data for the teachers, there are suggestions 
that the low participants became better be- 
haved, more likely to seek response oppor- | 
tunities, and more likely to come to the | 
teacher for help with their work during the 
second semester. They were still less likely 
than their classmates to seek out the 
teacher during the second semester, but: 
they were more likely to do so than they 
had been before the treatment. 


Radiation Effects 


The treatment had clearly changed the 
behavior of the teachers toward both 
groups. We now turn to the question of how 
these changes affected the more general 
classroom ecology. Did the effects radiate? 
If so, was it to the benefit or the detrimen 
of the classmates of the treatment students? 
To answer these questions, the raw (ums 
standardized) means for treatment an dj 
nontreatment students for each semester. 
were examined for each separate classroom. 
These data are presented in Tables 2 and 3: 

One factor complicating our analyses was 
a general change (trials effect) from thi 
first to the second semester on most sco’ 
These changes resulted from changes in tht 
nature of the activities going on in th 
classrooms at the time and were unrelal 
to the treatment study itself. For examp. 
there were more reading turns during t! 
second semester than the first. Changes 
this sort tended to be constant across 
students within a classroom, however, 8 
to be unrelated to treatment changes. j 

The data regarding radiation of effect 
for extension students generally suppo! 
the expectation that there would be $0 
radiation and that the radiation would b 
beneficial to the whole class. Analyses Te 
vealed that seven of the eight teach 
showed large gains in their percentages 
staying with extension students in failu 
situations, while one teacher did n 
change. Of the seven teachers who le 
to stay with extension students longer. 
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failure situations, three showed radiation 
effects such that they also tended to stay 
with other students longer. The other four 
teachers showed large gains for the exten- 
sion students and little change for their 
classmates. The teacher that did not change 
her behavior toward extension students did 
not change her behavior toward classmates, 
either. 
_ In summary, the treatment for the exten- 
- sion students was effective with seven of the 
eight teachers, and gains for extension stu- 
dents were achieved without loss to their 
classmates. Further, in three classes, the 
gains for extension students radiated so as 
- to benefit their classmates also. 

Analyses of variables less central to the 
treatment of extension students showed that 
- teachers tended to ask them more questions 
- during reading groups following treatment. 
| This inerease was not at the expense of 
iste except in one classroom where 

the reading group response opportunities 

were more than doubled for extension stu- 
dents while being cut in half for their class- 
* mates. In summary, the effects of the treat- 
ment for the extension students were rather 
general across teachers but were confined 
— mostly to the measures of teacher behavior 
- in staying with students following failure. 
For the most part, the advantages accruing 
to the extension students as a result of the 
treatment were not gained at the expense of 
classmates (although there was one excep- 
tion), and the treatment sometimes radiated 
to the benefit of classmates. f 
- The treatment regarding low-participa- 
tion students produced large gains in the 
. frequencies of response opportunities and 
. interactions afforded by the teachers. In a 

Sense, these quantitative gains were at the 

expense of classmates, since the means for 

Classmates went down in most classes, while 
r the means for the low-participation stu- 
dents went up. However, the effect of the 
treatment was to more nearly equalize re- 
Sponse opportunities and teacher-student 
contacts for low-participation students and 
heir classmates rather than to make the 
eachers spend most of their time with the 
Ow-participation students and ignore their 
Classmates, Further, as noted previously, 


TEACHER AND STUDENT BEHAVIOR 


403 


even after improvement following treat- 
ment, most of the measures of frequency of 
contacts with teachers showed low-partici- 
pation students to be still behind their 
classmates in the second semester. 


Discussion 


Clearly, the first treatment goal—chang- 
ing specified teacher behavior with target 
children—was accomplished. Teachers sub- 
stantially increased the number of contacts 
they had with low participants, and they 
also increased the percentage of times they 
continued to work with extension students 
after they responded inadequately. Thus, 
the consultation strategy of making teach- 
ers aware of their differential behavior to- 
ward students in quasi-equivalent circum- 
stances was effective in changing teacher 
behavior. 

Teachers changed their behavior more to- 
ward low-participation students, even 
though during the treatment interview they 
expressed more concern about and interest 
in changing their behavior toward extension 
students. This suggests that the basic hy- 
pothesis, inappropriate teacher behavior is 
due simply to unawareness, might be more 
appropriate to the low-participation group 
than to the extension group. The data from 
the low-participation group suggest that the 
teachers were broken out of a rut by our 
treatment: the treatment made them see 
these students in a whole new way, and it 
also made the students respond somewhat 
differently than they had in the past. In 
contrast, while the treatment succeeded in 
getting the teachers to stay with extension 
group students longer, it did not remove 
some of the other negative treatment of 
these students and, in general, did not lead 
to such widespread or beneficial effects as 
the treatment with low participants. This 
may mean that teacher behavior toward ex- 
tension students is controlled by student be- 
havior and subtle affective teacher reaction 
to this behavior. 

Perhaps a more systematic feedback plan 
would have produced greater changes; how- 
ever, the data show that a relatively simple 
and brief treatment was sufficient to change 
the teacher behaviors specified in this 
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study. The basic strategy—identifying and 
reinforcing certain desirable teaching. be- 
haviors and requesting that teachers trans- 
fer them to additional students—appears 
potentially applicable for producing a wide 
variety of desirable changes in school set- 
tings. - 

It is important to note that the investiga- 
tors alerted the teachers to their differential 
treatment of different students? but only 
suggested that target students might re- 
spond more favorably if they were treated 
in the same way as contrast children. Some 
time was spent suggesting how teachers 
could change their interaction patterns with 
target children, but few specific techniques 
were prescribed. It was left to the teachers 
to decide whether and how to operationalize 
these principles (increase contact, stay with 
the student). It should also be noted that 
the treatment involved bringing in the 
teacher as a partner rather than attempting 
to manipulate her through deception or pro- 
vision of phony information. We think that 
this was partly responsible for the treat- 
ment’s success as was the fact that all teach- 
ers were enthusiastic toward the study and 
perceived it as helpful and relevant. Thus, 
there is much to be gained from research 
which makes the investigator a partner 
and a resource person to the teacher rather 
than that which makes him a manipulator 
of the teacher. 

The effects of the treatment on student 
behavior were less clear than the effects on 
teacher behavior. There were signs that the 
low participants began seeking out the 
teacher more than they had previously, al- 
though the gains were relatively small. 
There was also slight evidence to show that 
extension students changed. Thus, both 
groups were influenced in some ways by the 


*This is not to suggest that teachers should 
treat all children in the same way. The point that 
students need different patterns of classroom con- 
tact with their teacher has been discussed at length 
elsewhere (Brophy & Good, 1974). In this paper, 
the term “treat the same way” is used to describe 
the idea that was presented to teachers: “You 
have already demonstrated that you can perform 
the behavior with similar students under similar 
circumstances; provide it for these students as 
well." 
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treatment program. Although the ch 
were small, the students did appear to 
reciprocally to improved teacher behay 

The effects of the treatment. program 
nontarget children are of great exp 
tal and practical interest. The va 
changing teacher behavior toward t 
three children is dubious if teacher inte 
tion patterns with other children d 
rate. However, teacher changes toward 
get children typically did not interfere 
other teacher-child interaction patterns, 
fact, when teachers did change their behay 
ior toward target children, they also: 
to change their behavior (in the same di 
tion) toward nontarget children, at le 
measures of qualitative aspects of te 
student interaction. e 

Another question of methodological ii 
est is the possible contamination 
(either positive or negative) of the 
ment upon teacher interaction with 
children. Teachers may change their 
ior toward target children in ways Not 
ified by the treatment (teachers asl 
"stay with" may also ask more freqi 
and more difficult questions). The 
and desirability of such contamination 
fects seem worthy of experimental {inv 
gation. Such effects may or may not 4 
in a given treatment; however, an im 
gator cannot discover or study them uni 
he includes a wider variety of interact 
variables than those involved in the 
ment. 

Contamination effects were operativi 
this study and most, but not all, were: 
tive. However, there were some negativi 
fects. For example, teachers criticiz 
tension students, when they gave 4 
answer or when they made no response 
ing reading group, more often thai 
had previously. Perhaps the teacher 
doing the wrong thing for the right | 
That is, the teachers were determin 
work with these students and to 4 
academic responses; perhaps when th 
sponses were inadequate, teacher 
pointment spilled out in the form © 
cism. In any case, it seems nece 
future investigators, especially if the 
complex treatment requests, 
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hee behavior for possible inappropriate 
changes. Treatments that make students 
| more salient to their teachers may produce 
undesirable side effects. 

Finally, note that for both groups, teach- 
ers focused more on the academic behavior 
of the students as a result of the treatment. 

hm both cases, the teachers worked for im- 
proved performance. However, they ap- 
peared to push and to challenge the exten- 

sion students while being much more sup- 
portive and protective with low partici- 
| pants. 

In summary, the study has demonstrated 
that a simple consultation strategy for pre- 

'senting teachers with feedback about their 
' behavior was effective in changing both 
quantitative and qualitative teacher behav- 
ior toward target students. In addition, stu- 
dent behavior was influenced by the change 
in teacher behavior. The effect of the treat- 
ment on nontarget children was not detri- 
mental. Of special interest is that specified 
r requested change in teacher behavior to- 
ard target children is likely to be accom- 
panied by additional changes in teacher be- 
havior toward target and nontarget chil- 
| dren. Predicting and controlling such effects 
should he of special concern to those who 
propose to change teacher behavior. Finally, 
a call has been made for more research 
which involves the classroom teacher as a 
research partner rather than as a subject to 
be manipulated. 
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THE EFFECTS OF READABILITY ON ORAL AND SILENT 
READING RATES. 


ESTHER U. COKE* 
Bell Laboratories, Murray Hill, New Jersey 


The usefulness of reading rate as a measure of reading difficulty was 
evaluated in two studies relating reading rate to two text-derived 
measures of readability. In the first study, oral reading rate was unaf- 
fected by readability since the subjects read at a constant syllable rate. 
A second study attempted to increase the saliency of the readability 
variable by having the subjects rate each passage for comprehensibility. 
Passages were read both silently and aloud at a constant syllable rate. 
Comprehensibility ratings were correlated with readability indicating 
that the subjects were sensitive to readability. These results suggested 
that there are important limitations on the usefulness of reading rate 


as a measure of reading difficulty. 


Recent experimental evidence has chal- 
lenged the validity of the assumption that 
reading rate is a good behavioral index of 
the readability of English prose. When read- 
ing rate is measured in units smaller than 
a word, the rate remains constant over a 
wide range of difficulty (Carver, 1971; Mil- 
ler & Coleman, 1971; Sticht, 1971). The 
present experiment is a replication of the 
Miller and Coleman study, using a much 
larger sample of texts and more precise ex- 
perimental techniques. 

Reading rate is a potentially useful mea- 
sure in studies of language processing since 
rate can be easily derived from observations 
of reading. Therefore, it is important to de- 
termine the sensitivity of this measure to 
factors that influence the difficulty of proc- 
essing written materials. 


EXPERIMENT 1 
Method 


Material. Ninety prose passages differing i - 
ject matter and difficulty were me tat. 
following sources: 29 passages from the 36 used 


"The author wishes to thank E. Z. Rothkopf 
for his adyice and suggestions. Portions of this 
paper are based on a paper read at the meeting of 
the American Educational Research Association, 
New Orleans, February 1973. 

* Requests for reprints should be sent to Esther 
U. Coke, Bell Laboratories, 600 Mountain Avenue, 
Murray Hill, New Jersey 07974. 
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by Miller and Coleman (1971); 28 passages fro 
the 330 used by Bormuth (1969); 12 seleetioi 
from a high school physics text; 14 selections, 
randomly chosen professional journals in che 
electronics, and economics; and 7 selections 

randomly chosen nontechnical books in a ni 
sity library. These passages covered the follo 
subject matters: narratives and popular science 
texts); history, civics, and literature (17 te 
economics (6 texts); biology and psychology (d 


(27 texts). Five additional passages were) se 
from the same sources and were used 
for warm-up effects. 

Computer-generated negative microfilm prin 
were made of all passages except the Miller-Co 
man selections. These prints were mounted 
slides. The Miller-Coleman passages had 
typed and photographed, and the positive p 
had been mounted in slides for another study. Th 
computer-generated letters were slightly la 
than the typewritten letters. [ 

Procedure. Each subject read all passages d 
two reading sessions held on separate days. At th 
beginning of the first session, the subject read | 
perimental directions which emphasized the 
to read rapidly without sacrificing ini 
The subject was told that his reading wí 
recorded to check for clarity and that his 
times would be clocked. The subject them. 
aloud 3 of the warm-up passages and 26 of 
experimental passages. After a five-minute 
25 more experimental passages were read 8 
At the beginning of the second day's session, 
experimenter reviewed the oral reading dire 
Then the subject read aloud 2 of the w 
passages and 14 experimental passages. After 8 
minute rest, 25 more experimental passages 
read aloud. 

Over both sessions, all the subjects read 
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| same 5 warm-up passages in the same order. Each 


subject read the 90 experimental passages in a 
different random order. 

Apparatus. 'The subject controlled the exposure 
time of all slides with a switch that also activated 
a timer. Reading time was recorded in tenths of 
a second. Each passage slide was followed by a 
blank slide so that the subject could rest after 
reading aloud. The slides were shown on a rear- 
projection screen. 

Subjects. Twenty Montclair State College under- 
graduates served as paid volunteers for the study. 

Text measures. Two measures of length were ob- 
tained for each passage: the total number of 
words and an estimate of the total number of 
syllables. The syllable count was estimated from 
the number of vowels according to a formula 
used by Coke and Rothkopf (1970). Two indices 
of readability were caleulated for each passage: 
the average word length in syllables and the 
average sentence length in words. The simplified 
Flesch reading ease score (Flesch, 1949) was also 
caleulated. This score is the weighted sum of the 
average word length in syllables and the average 
sentence length in words. 


Results 


To determine if the reading times for the 
29 Miller-Coleman passages differed from 
the times for the other 61 passages because 
of differences in typography, a one-way 
analysis of covariance was used with the 
total number of words, the average word 
length in syllables, and the average sentence 
length in words as covariates, The mean 
réading times of the two sets of passages 
did not differ significantly (F — .03, df= 
85, p > .05). Therefore, all 90 passages were 
used in the analysis of the data. 

Reading time and passage length. The 
mean reading time for each passage was 
calculated over all 20 subjects and corre- 
lated with both measures of passage length. 
Number of syllables was a better predictor 
of reading time (r — .94) than was number 
of words (r — .76). The difference between 
these correlations was statistically signifi- 
cant when the correlation between syllable 
and word length (r — .72) was taken into 
account, The large correlation between syl- 
lable length and reading time implies that 
Subjects read all passages at a reasonably 
Constant syllable rate. 

Reading rate and readability. Figure 1 
clearly shows that when reading rate was 
Measured in syllable units, rate remained 
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constant over a wide range of difficulty. 
When the two indices of readability were 
considered separately, neither average word 
length in syllables nor average sentence 
length in words was a good predictor of syl- 
lable rate. Average word length in syllables 
accounted for 12% (r = .34) of the variabil- 
ity in rate while average sentence length in 
words accounted for less than 1% (r = 
—.05) of the variability. 

Figure 1 also shows that when reading 
rate was measured in word units, hard pas- 
sages were read more slowly than easy pas- 
sages. The large correlation (r = .90) be- 
tween difficulty and word rate can be 
attributed to differences in average word 
length in syllables. Average word length in 
syllables accounted for 83% (r = —.91) of 
the variability in word rate while average 
sentence length in words accounted for only 
26% (r = —.51) of the variability. 
Discussion 

The results clearly support Miller and 
Coleman’s (1971) conclusion that reading 
rate is constant over a wide range of diffi- 
culty when rate is measured in units smaller 
than a word. The finding that word rate de- 
creased with difficulty can be accounted for 
by the observed syllable rate constancy. 
Since subjects read at a constant syllable 
rate, words containing more syllables took 
longer to read than words with fewer syl- 
lables. Therefore, the harder passages, 
which had a larger proportion of longer 
words, were read more slowly when rate was 
measured in word units. 

Two explanations of the observed reading 
rate constancy suggest themselves. One is 
that the output requirements of speaking 
make oral reading rate insensitive to pas- 
sage difficulty. Another explanation is that 
the reader does not need to understand a 
passage when he reads it aloud. In order to 
test the adequacy of these explanations, a 
second experiment was conducted. The 
reading task in this experiment involved 
comprehension and allowed for an inde- 
pendent evaluation of the reader's sensitiv- 
ity to passage difficulty. In addition, the 
passages Were read both aloud and silently. 


UNITS READ PER MINUTE 
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Fiaure 1. Scatter plots showing reading rate in syllables per minute (filled dots) and in 
words per minute (open dots) as a function of passage difficulty. (Straight lines were fitted 


to the data by the method of least squares.) 


ExPERIMENT 2 
- Method 


Material. Thirty-two passages were selected 
from the 90 passages of Experiment 1. These 32 
passages covered the same range of length and 
difficulty as the larger set. Miller-Coleman (1971) 
passages were generated on microfilm by computer 
so that the passages did not differ in typography. 

Procedure. Each subject read all 32 passages 
under one of three reading conditions. One group 
(aloud-pronounce) read each passage aloud and 
judged how difficult the words were to pronounce ; 
& second group (aloud-understand) read each 
passage aloud and judged how difficult the passage 
was to understand; a third group (silent-under- 
stand) read each passage silently and also rated 
the passage for comprehension difficulty. 

Each passage was rated on a 5-point scale 
ranging from very hard (1) to very easy (5). 
Passage slides alternated with blank slides. Ratings 
were recorded when the blank slide was on the 
Screen. 

A short-answer comprehension test was given 
immediately after the reading session. Subjeets 
were unaware that they would be tested. There 


ESTHER U. COKE 


SYLLABLES 
PER MINUTE 


40 20 o 
EASY ————— HARD 


were two questions about each passage. A con ti 
group (no reading) of 26 subjects took the ti 
without reading the passages first. n 
Reading directions were the same as in E 
periment 1. Silent readers were urged to read ea 
passage only once from the beginning to the el 
All subjects read 2 warm-up passages follo: b 
the 32 passages in random order. 
Apparatus. The same apparatus was used 
Experiment 1. n 
Subjects. Eighty-six paid volunteer high s 
students participated in the study. Twenty 8i 
jects were assigned to each of the three readi 
conditions. 
Tezt measures. The measures of passage leng 
and readability were the same as those in Exp 
ment 1. 


Results 


Reading time and passage length. The ® 
sults for Experiment 1 were replicatet 
Under all reading conditions, subjects Té 
the passages at a reasonably constant 
lable rate (r = .99, .98, and .94, res 
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tively, for conditions aloud-pronounce, 
aloud-understand, and silent-understand). 

Reading rate and readability. As in Ex- 
periment 1, reading rate in syllables was 
constant over the entire range of difficulty 
regardless of reading condition. Requiring 
subjects to evaluate comprehensibility did 
not make oral reading rate any more sensi- 
tive to readability than it was when sub- 
jects made pronounceability ratings or sim- 
ply read aloud as in the first experiment. 
Further, the removal of the output con- 
straint of speaking for the silent readers 
did not make syllable rate more sensitive to 
the readability indices, 

Comprehensibility and readability. Mul- 
tiple correlations were calculated between 
the average comprehensibility ratings of the 
passages and the two readability indices— 
average word length in syllables and aver- 
age sentence length in words. Jointly, these 
two indices accounted for 56% of the varia- 
bility in the ratings of oral readers (r = 
.75) and 58% of the variability of silent 
readers (r = .76). The results show that 
subjects were responsive to text features as- 
sociated with readability. 

In addition, subjects in all conditions re- 
membered a good deal of the passages’ con- 
tent. Subjects who read the passages an- 
Swered over 50% of the postreading ques- 
tions correctly. Subjects who had never read 
the passages before (no-reading condition) 
answered less than 1% of the questions cor- 
rectly. 

These findings rule out an explanation of 
the observed syllable rate constancy in 
terms of superficial processing. Subjects in 
this experiment were not simply pronounc- 
ing words without comprehension when 
reading aloud or were not simply glancing 
at the texts when reading silently. 


Discussion 


The results of this study fully support 
the contention that both oral and silent 
Teading rate remain constant across the en- 
tire range of difficulty of English prose when 
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rate is measured in a unit smaller than a 
word (Miller & Coleman, 1971). The al- 
most universal practice of measuring rate 
in words can lead to spurious conclusions 
about the relationship between reading rate 
and readability. In this experiment, the 
large positive correlation between word rate 
and passage difficulty could be explained 
in terms of the constancy of reading rate in 
syllables. Educational researchers would be 
prudent to look at syllable rate when as- 
sessing the effects of readability on reading 
rate. 

While subjects read at a constant syllable 
rate in this experiment, it is entirely possi- 
ble that changes in instructions or other 
task variables might have caused the read- 
ers to adapt their rate to passage difficulty 
(Kershner, 1964; Miller & Coleman, 1971). 
This dependence of reading rate on set fac- 
tors places important limitations on the use- 
fulness of both oral and silent reading rate 
in studies of reading. 
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The correspondence between the relative placement of the first six 
subtests of the Illinois Test of Psycholinguistic Ability (ITPA) in 
factor space with their theoretical placement in the representational 
level of the ITPA model was investigated using data from the eight 
age levels presented in Paraskevopoulos and Kirk. The relative place- 
ment of the six subtests in factor space was unrelated to their positions 
in the model for the 3- and 4-year-old children. The placement of the 
subtests showed significant correspondences with the model for each of 
the six older groups—age 5 through 10. 


The Illinois Test of Psycholinguistic 
Abilities (ITPA; Kirk, McCarthy, & Kirk, 
1968; McCarthy & Kirk, 1961) is an in- 
strument for assessing the language abilities 
of children. The revised test is composed of 
12 subtests designed to measure a three- 
dimensional model of communication (clin- 
ieal model). The test is conceptualized in 
terms of channels of communication, proc- 
esses, and levels of organization. The two 
channels incorporated in the test are the au- 
ditory-vocal and the visual-motor. The 
processes refer to reception, organization, 
and expression. The two levels of organiza- 
tion are the representational level and the 
automatic level. Items in the 6 representa- 
tional-level subtests are thought to require 
more abstract reasoning abilities, whereas 
the items at the automatic level are thought 
to be responded to in a more habitual man- 
ner. Each of the 6 subtests at the represen- 
tational level is designed in reference to a 
channel and a process. The automatic-level 


*Samuel A. Kirk generously agreed to allow us 
to use data from Tables A-1 through A-8 (Para- 
skevopoulos & Kirk, 1969) that made this study 
possible. 
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subtests are not considered to be as am 
ble to measuring separate processes. An 
tempt is made, however, to consider a sing 
channel, level, and process with each of 
subtests. 

The ITPA model is a modification 
theory presented by Osgood (1957a, 1957k 
The authors of the ITPA utilized 12 of th 
48 possible combinations of channels, levt 
and processes suggested by the Os 
model (Carroll 1972). The repr 
tional-level subtests are Auditory 
tion, Visual Reception, Auditory 4 
tion, Visual Association, Verbal Expr 
and Manual Expression. The automal 
level subtests are Visual Memory, Audito 
Memory, Visual Closure, Grammatie C 
sure, Auditory Closure, and Sound Ble 
ing. A more complete discussion of the 
tests may be found in the Examiner's M 
ual (Kirk et al., 1968). 

There has been little support for 
structure either experimentally or from #! 
tor-analytic studies of the ITPA (Da 
1972). The major emphases of the 
analytic studies of the ITPA have 
determine whether the subtests of theT F 
actually measure several different abiliti 
as has been claimed (Kirk & Kirk, 1 
and to determine whether factors obtal 
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lend support to the clinical model of the 
ITPA. 

Factor-analytie studies of the experimen- 
tal edition of the ITPA (MeCarthy & Kirk, 
1961) have typically reported three or four 
factors (Carroll, 1972; Meyers, 1969; Sil- 
verstein, 1967). Differences in the factors at 
different ages have been reported (Ryck- 
man & Wiegerink, 1969; Silverstein, 1967) ; 
more factors have been obtained with older 
children than with younger children (Ryck- 
man & Wiegerink, 1969); and social class 
appears to affect the factors obtained (Uhl 
& Nurss, 1970). It has been noted that the 
use of reference tests to extend the variance 
in the correlation matrix may increase the 
number of factors obtained and facilitate 
the interpretation of the factors (Carroll, 
1972; Meyers, 1969; Ryckman & Wieger- 
ink, 1969). 

Burns and Watson (1973) factor ana- 
lyzed the 12 subtests of the revised ITPA 
(Kirk et al., 1968), using a group of under- 
achieving children as subjects, and obtained 
five factors. They interpreted the factors 
as supporting existence of the visual-motor 
and auditory-vocal channels of the clinical 
model of the ITPA as well as supporting 
that of the process of expression. However, 
support for the reception and association 
processes and differentiation between the 
automatic and representational levels of the 
theoretical arrangement was not found. 

Two explanations for the lack of defini- 
tive support for the arrangement present 
themselves. The first is, of course, that the 
ITPA subtests may not yield data that con- 
form to the structure. The second possibil- 
ity which must be considered is that, al- 
though the ITPA may correspond to the 
model in some degree, the factor-analytic 
studies of the test scores have failed to find 
the correspondence. This might occur in two 
ways. 

First, the factor patterns obtained by an- 
alytie methods of rotation (c.f. Harman, 
1970) may have failed to correspond to the 
three dimensions of the configuration closely 
enough to be recognized by visual in- 
Spection. This seems especially likely with a 
structure, such as the ITPA model, that 
predicts a uniform distribution of points in 
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factor space rather than a number of clus- 
ters of points. In particular, the varimax 
method of rotation maximizes the variance 
of a factor by rotating it toward points 
relatively close together. A slight displace- 
ment of one point in the direction of an- 
other nearby point would tend to “attract” 
a factor, possibly causing the rotated factor 
to miss a theoretically more meaningful po- 
sition. The second possibility is that the 
three axes of the spatial configuration may 
not be measured orthogonally by the ITPA. 
If the three axes are correlated, an orthogo- 
nal factor analysis might find one or two of 
the dimensions of the structure and fail to 
indicate the others. 

Several of the studies mentioned above 
(Burns & Watson, 1973; Meyers, 1969; 
Ryckman & Wiegerink, 1969) have found 
partial although not definitive support for 
the clinical model of the ITPA. The present 
study deals directly with the correspond- 
ence between the ITPA and the clinical 
model underlying its development and 
interpretation by using a factor-analytic 
method that is unaffected by the idiosyn- 
eracies of rotation and which should be tol- 
erant of a moderate degree of correlation of 
the axes of the arrangement. This method 
has been recently employed to test the cor- 
respondence between a personality inven- 
tory and its underlying model (Wakefield 
& Doughtie, 1973). As there is reason to 
believe that the factors of the ITPA change 
with age, the correspondence is tested inde- 
pendently for the eight age groups pre- 
sented in  Paraskevopoulos and Kirk 
(1969). 


MzrHOD 


Analysis of the Model in Terms of Distance 
Comparisons 


In order to test the correspondence of the sub- 
tests of the ITPA with the clinical model, it is 
necessary to consider the relationships between 
pairs of subtests in terms of distances between the 
corresponding points in the spatial configuration. 
The order relationships among the distances im- 
plied by the structure will then serve to define the 
model for the purpose of the test. 

The conceptualization, as presented by Para- 
skevopoulos and Kirk (1969), implies order re- 
lationships among the distances between the points 
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PROCESSES 


Organization Expression 


Reception 


Ficure 1. Representational level of the Illinois 
Test of Psycholinguistic Ability model. (Abbrevi- 
ations: AR = Auditory Visual, VR = Visual Re- 
ception, AA = Auditory Association, VA = Visual 
Association, VE = Verbal Expression, ME = 
Manual Expression.) 


in the representational level of the model. Un- 
fortunately, they were not specific about the posi- 
tions of the automatic, “whole level” subtests. As 
a result, no hypotheses about distances involving 
these points were made. However, the positions of 
the six subtests at the representational level were 
clearly indicated. These six subtests were located 
at the intersections of the two channels of com- 
munication—auditory and visual—with the three 
processes—reception, organization, and expression. 
The representational level of the model is presented 
in Figure 1. 

By constructing a complete graph of these six 
points (ie. connecting every point with every 
other point; c.f. Coombs, Dawes, & Tversky, 1970, 
pp. 78-94), all the interpoint distances in the 
theoretical arrangement are represented as lines 
connecting two points. A complete graph of six 
points contains 15 lines. The rectangular structure 
of the representational-level subtests clearly im- 
plies that certain of these lines should be longer 
than others. 

Seven of the lines in Figure 1 are short sides. 
These lines connect points that are in the same 
channel and adjacent processes or that are in the 
same process and different channels. Four lines are 
short diagonals connecting points in different chan- 
nels and adjacent processes. Two lines are long 
sides connecting points that are in the same chan- 
nel and in nonadjacent processes. The remaining 
two lines are long diagonals connecting points that 
are in different channels and nonadjacent proc- 
esses. 

The following order relations of interpoi 
distances follows from the configuration, The ee 
long diagonals are longer than all the other lines. 
The seven short sides are shorter than all the other 
lines. The short diagonals and the long sides are in- 
termediate in length. 

The lack of specificity in the literature on the 
ITPA concerning the relative separation among 
processes and between channels requires that two 


DOUGHTIE, WAKEFIELD, Jr., SAMPSON, AND ALSTON 


sets of order relationships be given special at 
tention. The first set is the comparisons of the two, 
long sides with the three short sides that connect) 
points in different channels. Depending on 
relative separation along the two dimensions, the 
three short distances could be classified as short of 
long. They were classified as short for two reasons) 
The ITPA manual (Kirk et al., 1968) depicted thé 
two channels as closer than the reception and ex 
pression processes, A methodological reason 
(Campbell & Fiske, 1959) is that the channels may 
be considered different methods of measuring the 
three theoretical constructs—reception, organization 
and expression. As the two methods may be dif 
ferent due to artifacts of measurement, placing th 
two measures of each construct relatively close to 
gether demonstrates the validity of the thre 
processes. 

The second set of order relationships due special 
attention is the possible comparisons between the 


short diagonals and the long sides. Following the 
reasoning of the previous paragraph, it would b 
desirable for the short diagonals to be shorte 
than the long sides. However, the relationship bes 


tween these two sets of distances is influenced by 
factors other than the relative separation of point 
along the two dimensions. The most important in 
fluences are the angles between the sides. A slighi 
deviation from orthogonality would change thi 
lengths of the diagonals without changing th 
lengths of the sides. As lack of orthogonality of the 
dimensions was considered a possible reason fon 
the inconclusive results of previous factor-analytit 
study of the ITPA, comparisons so largely in- 
fluenced by variations in the angles separating the 
sides were avoided, Hence, comparisons of long 
sides and short diagonals were not employed in the 
test of the model. 

The 15 interpoint distances are listed in Tabl 
1 as members of three classes defined by thelf 
relative lengths as derived from the model. The 
classes are called the short, middle, and long 
classes. Each of the seven short lines is short& 
than each of the six middle lines; this yields ^4 
pairs of ordered distances. Each of the six middle 
lines is shorter than each of the two long lines 
this yields 12 pairs of ordered distances. In totai 
54 pairs of ordered distances describe the repi 
resentational level of the model. 


Data 


Eight matrices of the intercorrelations of 
12 subtests of the revised ITPA from Paraske f, 
poulos and Kirk (1969, pp. 202-209) for eight 7 
ferent ages were used to test the model. The 4 
matrix was obtained on a sample of 107 child 
aged 2 years 7 months to 3 years 1 month; 
second, on a sample of 116 children aged 3 yeu 
7 months to 4 years 1 month; the third, on 4 
sample of 115 children aged 4 years 7 months y 
5 years 1 month; the fourth, on a sample 9 
children aged 5 years 7 months to 6 years 1 mont 
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the fifth, on a sample of 124 children aged 6 years 
7 months to 7 years 1 month; the sixth, on a 
sample of 123 children aged 7 years 7 months to 8 
years 1 month; the seventh, on a sample of 127 
children aged 8 years 7 months to 9 years 1 month; 
the eighth, on a sample of 122 children aged 9 years 
7 months to 10 years 1 month. 


Factor Analyses 


In order to test the correspondence between the 
empirical placement of subtests in factor space 
and their theoretical arrangement against a null 
hypothesis of random placement, it was necessary 
to consider the dimensionality of the factor space. 
Shepard (19622, 1962b) has shown that the dis- 
tances among a set of N points can take any con- 
ceivable rank order in a space of at least N — 1 
dimensions. In order to allow the possibility of the 
occurrence of any rank order of the distances 
among the six representational-level subtests, a 
Space of at least five dimensions was necessary. 
Thus, the first five principal components were ex- 
tracted from each of the eight matrices. The five 
components were rotated by the varimax method 
(Harman, 1970). 

The eight analyses accounted for approximately 
the same percentages of total variance at each age, 
ranging from 61.1% to 67.9%. 


Distances in Factor Space 


Euclidean distances from every point to every 
other point in factor space were computed by a 
formula representing a generalization of the fami- 
liar Pythagorean theorem to multidimensional 
Space (c.f. Wakefield & Doughtie, 1973). 


TABLE 1 


Firry-Four Orper RELATIONS oF DISTANCES 
Dertvep FROM THE ITPA MopEL 


42 ordered pajzs 12 ordered pairs 
apnett jess, Middle Middle fess „Long 
stances efan distances | distances than distances 
AR-VR VA-AR | VA-AR 
AA-AR AA-VR | AA-VR ME-AR 
VA-VR ME-AA | ME-AA VE-VR 
VA-AA < VE-VA | VE-VA < 
VE-AA VE-AR | VE-AR 
ME-VE ME-VR | ME-VR 
ME-VA 


Note. Abbreviations: ITPA = Illinois Test of 
Psycholinguistic Ability, AR = Auditory Recep- 
tion, VR = Visual Reception, AA = Auditory 

sociation, VA = Visual Association, VE = 
Verbal Expression, and ME = Manual Expression. 
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TABLE 2 
PAIRS oF DISTANCES IN Factor SPACE 
CORRESPONDING TO THE ITPA MopEL 
AT ErcHT AGES 


[erroe I sepe soe e Cau 
2-7 to 3-1 22 ns 
3-7 to 4-1 26 ns 
47 to 51 46 .01 
5-7 to 6-1 39 .01 
6-7 to 7-1 37 .01 
7-7 to 8-1 35 .05 
8-7 to 9-1 42 .01 
9-7 to 10-1 36 .05 


Note. Abbreviation: ITPA = Illinois Test of 
Psycholinguisiic Ability. 


Comparing Orders of Obtained Distances to 
the Model at Each Age 


The order of each pair of distances in factor 
space for which the ITPA model yielded a 
specific prediction was obtained for each age group. 
The total number of empirical distances that 
showed the correct theoretical order was obtained 
for each age. 


Significance of the Correspondence of the 
ITPA to the Model at Each Age 


In a space of five dimensions, any distance be- 
tween two of six points is equally likely to be 
greater than or less than any other distance be- 
tween two of the six points (Shepard, 19622, 1962b). 
Hence, each of the 54 comparisons of a theoretical 
ordering of two distances with the corresponding 
empirical order represents a Bernoulli trial (with 

= 5). 

É Using the normal approximation to a binomial 
test for goodness of fit (Siegel, 1956, pp. 36-42), it 
was found that 34 comparisons between two dis- 
tances occurring in the predicted order were neces- 
sary to reject the null hypothesis of random dis- 
tance orders at the .05 level and 37 were necessary 
to reject at the 01 level. Rejection of the null hy- 
pothesis would, of course, support the contention 
that the ITPA subtests are interrelated in the 
fashion prescribed by the model. 


RESULTS 


Table A contains the eight factor pat- 
terns, the eight sets of interpoint distances 
in factor space, and the eight lists of correct 
and incorrect distance pair orders.* 


*A 10-page table containing the eight factor 
patterns, the eight sets of interpoint distances in 
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Correspondence to the Model 


Table 2 presents the number of pairs of 
distances that had the same order as that 
predicted from the model for each age level. 
For the two youngest levels, there seemed 
to be no relationship between the ITPA 
subtests and the representational level of 
the model. The number of correct orders by 
chance alone is 27 (half of the 54 observa- 
tions). The orders for both of the two 
youngest ages are close to (in fact, slightly 
below) chance level. The subtests of the 
ITPA did show significant correspondences 
to the representational level of the model 
for the six older groups of subjects. Four 
ages yielded significance at the .01 level 
and two at the .05 level. 


Correspondence to the Model and Age of 
Subjects ; 

A Spearman rank-order correlation (Sie- 
gel, 1956, pp. 202-213) between the rank 
order of age and the rank order of the num- 
ber of correct distance comparisons was 
computed to find evidence of a develop- 
mental trend toward increasing correspond- 
-ence to the theoretical structure with in- 
creasing age. The correlation (r = .36) was 
not significant (p > .05). 


Discussion 


The clinical model of the ITPA was reli- 
ably approximated by the representational- 
level subtests on data obtained from the six 
oldest age groups treated by Paraskevopou- 
los and Kirk (1969) but not on data ob- 
tained from the two youngest age groups. 
The age groups that ranged from 2 years 7 
months to 3 years 1 month and from 3 
years 7 months to 4 years 1 month yielded 
patterns of the six representational-level 
subtests which had no demonstrable rela- 
tionship to the clinical model. The children 
in the six older groups, near their fifth 
sixth, seventh, eighth, ninth, or tenth birth. 
days, yielded subtest: patterns that approxi- 
mated the pattern of the representational 
level of the clinical model. 

An inspection of Table 2 suggests that 
factor space, and the eight lists of correct and in- 


correct distance pair orders is available from the 
senior author upon request. 
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the groups did not gradually approach the 
theoretical arrangement with advaneim 
age, as an incremental view of languag 
development would predict. Rather, th 
change from no relationship to the ITP, 
conceptualization to that of a significant 
correspondence to the model occurre 
within a single year. There are, of course 
fluctuations in the degree to which each age 
approximates the configuration, but no re 
lationship between correspondence to the 
structure and the age of the subjects is sug: 
gested except the one large change betwee 
the group of children who were approxi- 
mately four years old and the group who 
were approximately five. 

These results have bearing on the appro 
priateness of some interpretations draw 
from ITPA score patterns for children al 
different ages. While interpretations based 
on empirical prediction of a criterion ai 
not affected by the appropriateness or inapi 
propriateness of the clinical model, mo 
theoretical interpretations based on the the 
oretical arrangement must be held suspett 
for children who are four or youngeh 

Interpretations of the ITPA based on th 
model may, however, be reasonably applied 
to children who are approaching or have 
passed their fifth birthday. 

An important milestone in the develop 
ment of children is suggested by these re 
sults. Between the ages of 2 and 10 years, 
there is apparently a single, rather sudden) 
change from a lack of correspondence to the 
model of communication at the representa” 
tional level to an enduring correspondent) 
to the theoretical structure. The age ai 
which the change occurs corresponós 
roughly to the end of Piaget’s preconceptudl 
phase (of the preoperational stage) and 
beginning of the intuitive phase (Long 
treth, 1968). The important distinction be 
tween children at these levels is that tht 
child in the preconceptual phase is not ablé 
to perform operations concerning classé 
relationships, and number, while the intu 
tive child can perform such operations, 7 
though not as parts of larger systems © 
operations. A closer look at the items Com 
prising the subtests of the ITPA at the rep 
resentational level may reveal many i^ 
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that involve cognitive operations that chil- 
dren younger than about 4 years of age 
cannot perform. 
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^A OF PROFESSORS! 
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This study investigated two research questions: (a) Would rewording 
items on a questionnaire for evaluating faculty teaching effectiveness 
substantially affect, students' ratings? (b) Would students' ratings of 
professors' teaching quality be totally consistent, with their ratings of 
benefits derived from courses? Subjects were 358 students in 21 classes 
at the University of South Florida who were administered two faculty 
evaluation questionnaires. Results showed that student ratings were 
affected very little by a major rewording of items and that a sub- 
stantial degree of linear independence existed between students' per- 
ceptions of the quality of an instructional process and their perceptions 
of the degree to which they benefited from the instructional process. 


Although scholarly publication has been 
a dominant criterion in judging the worth 
of faculty in many institutions, students 
(and in the case of public institutions, state 
legislatures) are now demanding that qual- 
ity and responsibility in teaching be given 
dominant or, at least, substantial weight. In 
Florida, the Board of Regents of the state 
university system has developed guidelines 
on eriteria to be used for retention, tenure, 
and promotion of faculty. The three areas 
of teaching, research, and service are to be 
weighted 70%, 20%, and 10%, respectively. 

It is self-evident that service activities 
are worthwhile and readily countable. The 
worth of research work can be judged by 
the adversary system of public debate and 
prepublication review procedures employed 
by scholarly journals (although some would 
argue that the judgment procedure is often 
far too lenient). When it comes to judging 
teaching quality, however, we are largely at 
a loss. We have no general agreement on 
good models, and the recent review works of 
Rosenshine (1970) and Gage (1972) assem- 
bled little evidence on teacher behaviors 


*The authors wish to express their appreciation 
to Ronald Register who helped in compiling and 
analyzing the data. 

* Requests for reprints should be sent to Richard 
M. Jaeger, College of Education, University of 
South Florida, Tampa, Florida 33620. 
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that consistently predict desirable chang 
in students. 

Scriven (1972) suggested that colle 
professors be judged by the outcomes tht 
produce, by their peers, and by their st 
dents. Judgment by outcomes is a radi¢ 
idea in the evaluation of college pro! 
sors—a procedure yet to be tried. Whi 
presents methodological problems of 
mous magnitude (eg, standardizatik 
equating, and the establishment of ut 
ties), one can hope for progress in the co 
ing years. Serious evaluation of teachi 
quality by a college professor's peers ise 
fectively precluded by the closed-doo! 
of the university. As a result, peer € 
tion is often based upon unfounded 
and suffers from a composite halo elit 
derived from the more observable activit 
of university faculty. So, in the world o: 
practical, we are left with students’ jut 
ments of a professor’s competence 8$ 
teacher; such judgments are the subject 
this paper. 

Student judgments of a professor’s teat 
ing ability are most often secured throu 
pencil-and-paper  fixed-option questi 
naires, Gustad (1961) indicated that amt 
584 colleges and universities surveyed, 8 
dent ratings as a means of evaluating 3! 
ulty were used most often. Werdell (19 
indicated that in recent years, stude 
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have provided a strong impetus for the use 
of student ratings in evaluating faculty. 
Eble (1970) found that student opinion 
questionnaires were the most commonly 
used form of teacher and course evalua- 
tions. 

A number of student rating forms have 
been developed during the last five decades. 
Excellent brief histories of the development 
of these instruments, along with a descrip- 
tion of a number of the instruments, are 
given by Werdell (1967) and Eble (1970). 
The content of these instruments is very 
similar; a set of questions on the behavior 
of the professor and the content and organi- 
zation of the course is typical. The struc- 
ture of teaching behaviors and course char- 
acteristics represented by questionnaries 
has been explored by Isaacson et al. (1964), 
Meredith (1969), Caffrey (1969), and 
McKeachie (1969), using factor-analytic 
techniques. 

Meredith (1969) made the observation 
that there was need for a set of items in 
teacher rating scales which would measure 
a class of variables associated with “effects 
on the learner.” In a more recent study, 
Hartley and Hogan (1972) pursued Mere- 
dith’s suggestion by supplementing the tra- 
ditional process-oriented questionnaire with 
items that addressed students’ perceptions 
of their “progress and performance as re- 
lated to this course.” Upon factor analyzing 
the results, they found that all of the fac- 
tors were defined exclusively by items from 
either the “process-oriented” part of the 
questionnaire or the “performance” part. 
They indicated that this clear separation of 
items into process and performance factors 
suggests the need for continued exploration 
and expansion of instruments for securing 
student evaluations of courses. 

In this paper, two research questions are 
explored. The first is, If the items on a 
questionnaire for student rating of profes- 
sors are reworded, will substantial changes 
in the rated abilities of professors result? 
And the second, as a partial replication of 
the work of Hartley and Hogan, is, If some 
items on a questionnaire for student rating 
of professors concern the process of instruc- 
tion and others concern the outcomes of in- 
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struction, will the process items and the 
outcome items define mutually exclusive, 
orthogonal factors? That is, will the varia- 
bles that correlate highly with some factors 
be exclusively of the “process” type, and 
will the variables that correlate highly with 
other factors be exclusively of the “out- 
come” type? Such a result would indicate a 
substantial degree of linear independence 
between the outcome variables and the 
process variables and, therefore, a substan- 
tial degree of exclusiveness between profes- 
sors rated highly on the process variables 
and those rated highly on the outcome vari- 
ables. 


METHOD 


Instruments 


Two rating scales were used to obtain student 
opinions on professors, courses, and the student's 
own development. The first scale, an officially ap- 
proved form at the University of South Florida, is 
composed of 17 items, 9 pertaining to the instruc- 
tor of the course and 8 pertaining to the course. 
The items of the rating scale were as follows: 


Standard Questionnaire 


Please mark the following 17 items: (1) Below 
Average, (2) Average, (3) Above Average, (4) 
Superior, (5) Excellent. 


Instructor of Course 12 343 


1. Instructor’s knowledge of 
course 

2. Instructor’s preparation for 
class presentations 

3. Clarity of instructor’s class 
presentations 

4. Instructor’s enthusiasm for 
course 

5. Relationship of instructor's 
presentations to course ob- 
jectives 

6. Instructor’s stimulation of 
students’ thinking, consistent 
with course objectives 

7. Instructor’s fairness and ob- 
jectivity ingrading | 

8. Instructor’s interest in stu- 
dents 

9. Instruetor's overall effec- 


tiveness 


Course 
10. Clarity of course objectives 
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11. Relationship of course con- 
tent to course objectives 

12. Organization of course 

13. Assignments appropriate to 
level of course 

14. Amount of assignments for 
level of course 

15. Evaluation appropriate to 
level, content, and objectives 
of the course 

16. Amount of evaluation for 
level of course 

17. Course's overall effectiveness 


The second scale was developed by the authors 
for this study. It too contains 17 items, 9 under 
the heading “the process of teaching" and 8 under 
the heading "the product of teaching". 

Examples of items in both sections of the re- 
vised scale are displayed. 
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The product items on the revised questionnaire | 
are of two types. Pairs of items deal first with the | 
student's perception of his/her skill development 
as a result of taking the course and then with the 
student's perception of the professor's facilitation 
of his/her skill development. Three pairs of items 
are addressed to the knowledge and comprehension 
levels of the cognitive taxonomy developed by 
Bloom et al. (1956), and one pair of items con- 
cerns the acquisition of values, following the work 
of Krathwohl, Bloom, & Masia (1964). 

In the standard questionnaire, a common 5-/ 
point scale follows each item: the first option is) 
labeled “1, below average;” the second, “2, aver- 
age;” the third, “3, above average;" the fourth, 
“4, superior;” and the fifth, "5, excellent.” The | 
scale is implicitly asymmetric and attempts tol 
spread a set of ratings that have, in the past, 
clustered at the high end of the scale. The items 
are stated as short phrases but never as complete’ 
sentences. Examples include “organization of] 
course” and “instructor’s enthusiasm for course”. 


A. The Process of Teaching 


__ 24. In comparison to other instructors you have had (college) how would you rate the clarity of this 
instructor’s presentations and lectures in class? 


Among the least Not as clear About average Clearer than Among the 
clear as most most clearest, 
(bottom 10%) (bottom 30%) (top 30%) (top 10%) 
(1) (2) (3) (4) (5) 


26. Did the material presented by this instructor (both in class and through any outside assign- 
ments) correspond to the objectives stated for this course. 
Not at all Very little To some degree Substantially Completely 
a) (2) (3) (4) (5) 


29. In comparison to other instructors you have had (college), how would you rate the quality of 
this instructor’s evaluation of students’ knowledge and skills? 


Among the Poorer than Average Better than Among the 
poorest, most most best 
(bottom 10%) (bottom 30%) (top 30%) (top 10%) 
Q) Q) 8) (4) 6) 

32. 1f you had the opportunity to take another course from this instructor, would you do 80? 
Definitely not Probably not I don't know Probably so Definitely 80 
(1) (2) (3) (4) (5) 


B. The Product of Teaching 


33. By taking this course, did you gain knowled; ; : ful? 

7 ge which you consider to be valuable and use! 
Definitely not Probably not Possibly so Probably so Definitely 80 - 
(1) (2) (3) (4) (5) 


34. To what extent, was your gaining of new knowledge facilitated by the instructor? 
Not at all Very little To some degree Substantially 
a) (2) (3) (4) (5) 
39. By taking this course, did you. i i i i m t 
value Structure you id previoualyd acquire a set of educational values which are different fro! 
Definitely not Probably not Possibly so Probably so Definitely 80 
a) (2) (3) (4) (5) 


Not at all Very little To some di i 
legree Substantiall 
@) @) @) gn e 
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All of the process items were constructed to parallel a professor ratin; i i i 
l : r k g questionnaire previousl; 
used at the University of South Florida. Five of these items have parallels in a questionnaire e 


rently used: 
Standard Questionnaire Item 


Clarity of instructor’s class presentations 
Below 


Average Average Excellent 
1 2 5 
Organization of course 
Below 
Average Average U Excellent 
1 2 5 
Instructor’s enthusiasm for course 
Below 
Average Average Excellent 
1 2 5 


Instructor’s fairness and objectivity in grading 
Below 


Average Average Excellent 
1 2 5 
Instructor’s overall effectiveness 
Below 
Average Average Excellent 
1 2 5 


The process items in the revised questionnaire, 
while structurally similar to those in the standard 
questionnaire, differ in several respects. Each item 
is composed of at least one complete sentence, 
usually in the form of a question. The symmetric 
5-point rating scales make use of adjectives that 
are grammatically consistent and logically con- 
Sistent with the item stems. The items on the stan- 
dard questionnaire were revised in these ways in 
2n attempt to make the questionnaire less am- 
biguous and to increase the consistency of stu- 
dents' interpretations of items. If this objective 
could be achieved, the error variance of response 
due to misinterpretation of questions would be 
reduced, and the reliability of students’ responses 
Would be correspondingly increased. 

Although the adjectives associated with the 
Scale points on the standard questionnaire are 
implicitly norm referenced, the population to be 


Revised Questionnaire Item 


In comparison to other instructors you have had 
(college), how would you rate the clarity of this 
instructor’s presentations and lectures in class? 


Among the least Among the 
clear clearest 
(bottom 10%) (top 10%) 


In comparison to other instructors you have had 
(college), how would you rate this instructor’s 
organization of this course? (Did the sequence of 
topics and the mode of presentation help you in 
learning the material presented?) ] 
Among the poorest Among the best 
(bottom 10%) (top 10%) 
(1) (5) 


In your opinion, is this instructor's attitude to- 
ward teaching this course, as reflected by his (her) 
enthusiasm in classroom presentations 


Extremely Extremely 
negative positive 
(bottom 1075) (top 10%) 


Q) 6) 


In comparison to other instructors you have had 
(college), how would you rate the quality of this 
instructor’s evaluation of students’ knowledge 
and skills? 
Among the poorest Among the best 
(bottom 10%) (top 10%) 
(1) (5) 


In comparison to other instructors you have had 
(college), how would you rate the overall effec- 
tiveness of this instructor as a teacher? 
Among the poorest Among the best 
(bottom 10%) (top 10%) 
(1) (5) 


used in making normative judgments is not identi- 
fied. In contrast, most of the process items in the 
revised questionnaire use the phrase, "in com- 
parison to other instructors you have had (col- 
lege),” to identify a specific reference group and 
to emphasize the normative basis of the judge- 
ments requested. Since the results of student rat- 
ings are often interpreted normatively, it is 
important that students rate their professors com- 
paratively. 

Wherever feasible, a numerical referent (bottom 
10%, bottom 30%, etc.) has been associated with 
each scale point in the revised questionnaire, in 
addition to the Digits 1 through 5 associated with 
the adjectives in the standard questionnaire. It 
is likely that students interpret the numbers asso- 
ciated with the scale points of the standard ques- 
tionnaire ordinally, since they are associated with 
adjectives that are ordinally arranged on a scale 
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of desirability. Students cannot confidently as- 
sume these numbers, and their associated adjec- 
tives, to have interval scale properties. Is “5, ex- 
cellent” as far above “4, superior" as “3, above 
average" is above "2, average"? Such an assump- 
tion is unwarranted. Normative numerical referents 
were used in the revised questionnaire in an 
attempt to reduce the ambiguity of the scale points 
in the standard questionnaire. 


Sample 


A 50% systematic random sample of faculty in 
the College of Education, University of South 
Florida, was asked tp participate in the administra- 
tion of both instruments, Usable results were ob- 
tained from 21 faculty members (about a fourth of 
the designed sample). While one might suspect 
that participating faculty represent those most con- 
fident about their teaching abilities, there is no 
reason to suspect that lack of representativeness 
will differentially bias the comparison of the stan- 
dard questionnaire and the new questionnaire. The 
responding faculty provided ratings from 358 stu- 
dents in 16 undergraduate courses and 5 graduate 
courses, The subject matter of the courses included 
physical education, elementary education, educa- 
tional foundations, science education, and social 
studies education. 


Procedure 


Faeulty participation in the study was requested 
by letter, 'Those who agreed to participate were 
provided with standard questionnaires and revised 
questionnaires for a single class. Those whose last. 
names began with the Letters A through M were 
asked to administer the standard questionnaires 
first, to have the standard questionnaires collected, 
and then to administer the revised questionnaires. 
Those whose last names began with the Letters N 
pore, ns b distribute the revised 

ave them collected, and 
then to administer the standard questionnaires. 
Students completed both questionnaires in a single 
class period, under the supervision of a fellow stu- 
dent. Student ratings were then delivered by stu- 
dents to a scoring service, and faculty were in- 
formed of results several weeks after they com- 
pleted student evaluations. t 


ResuLTS AND DISCUSSION 


Our first research question, “If the items 
on a questionnaire for student rating of 
professors are reworded, will substantial 
changes in the rated abilities of professors 
result?”, can be explored in several ways. If 
we focus on individual teaching behaviors, 
normative inquiries and absolute inquiries 
lead to different modes of analysis. The 
normative question can be put as follows: 
“If items are reworded, will there be sub- 
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anchor point. Suspecting that the 


TABLE 1 


SrkARMAN RANK-ORDER ConRELATION: 
COEFFICIENTS ror PAIRED ITEMS 
WITH Inpicatep DESCRIPTORS 


Descriptor Correlation coeffi 


Clarity of instruction 

Organization of course 

Professor’s enthusiasm 

Quality of grading 

Professor's overall effec- 
tiveness 


man rank-order correlation coefficient 
computed for each pair of items. Them 
order correlations range from a low of 
for the items dealing with instructor en 
siasm to a high of .92 for the items 
with clarity of instruction. If we 8 
these correlations, it appears that atf 
half of the variation in ratings of p 
sors’ behavior on these dimensions ¢ 
predicted from reworded items. 
sults provide confidence in the paral 
of the paired items, show extraord 
parallel-forms reliabilities for sing) 
scales, and tend to convince us th 
rewording of items used here doesn’t 0 
much difference in the ranking of pre 
sors. We suspect that these correlations 
nearly as high as their identical-fe 
test-retest reliabilities will allow. ; 
When administered in the Univers 

South Florida, College of Education 
point rating scales on professors’ behs 
generally resulted in median ratings clos 
4. This may be due to superior 
(one might presume that education fati 
practice what they preach), or it mi 
artifactually induced by the rating 

used. If everyone is rated above average 
average rating begins to lose meaning 4 


5 


might be explained in part by the abs 
of numerical referents in the scale-point 
chors (below average, average, superior. 


EVALUATION OF PROFESSORS 


cellent, etc.), we expected that the use of 
numerical anchors would provide a more 
symmetric distribution of ratings. After all, 
could many more than 10% of the profes- 
sors be “among the best (top 10%)”? If we 
generalize from ratings of professors in the 
College of Education, this is apparently so. 
While our absolute results cannot be gener- 
alized to the entire college because of the 
self-selection of our respondents, we can 
compare the distribution of responses on 
paired items. Because the numerical scales 
on the two questionnaires have different 
points associated with average performance, 
we have reparameterized the distributions 
about their respective averages. Table 2 
shows percentages of professors rated above 
average for these items. 

While the revised questionnaire did pro- 
duce slightly more symmetric distributions 
of ratings (a nearly 10% difference in pro- 
portions rated above average on “organi- 
zation of course"), the differences were not 
practically significant. 

In reviewing the completed question- 
naires, a tendency among many students to 
rate a professor consistently high or con- 
sistently low on all items was noted, an 
evidence of a halo effect. Since it was sus- 
pected that few professors are consistently 
high on all behaviors, the degree to which 
paired items on the two questionnaires 
caused students to differentiate among 
rated behaviors was examined. A principal- 
components analysis of the paired items on 
the standard questionnaire and the paired 
items on the revised questionnaire was com- 


TABLE 2 
Percentage or 21 Pmorrssons RATED ABOVE 
AVERAGE ON PAIRED VARIABLES WITH 
INDICATED DESCRIPTORS 


% above average 
ne acum o. 
E 
escriptor Standard Revised, 
questionnaire | quest 
Clarity of instruction 89.5 81.3 
rganization of course 91.0 81.6 
rofessor's enthusiasm 96.9 96.4 
Quality of grading 90.9 81.4 
Tofessor’s overall ef- 
fectiveness 91.9 86.1 


es Mae Ls 
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TABLE 3 
CUMULATIVE Proportion oF VARIANCE 
AccouNTED FOR BY INDICATED NUMBER 
or PRINCIPAL COMPONENTS FOR PAIRED 
VARIABLES ON STANDARD AND 
REVISED QUESTIONNAIRES 


Questionnaire 
No. principal components} 
Standard Revised 
1 -68 -60 
2 -80 -76 
3 .89 87 
4 .95 .94 
5 1.00 1.00 


pleted, and it was reasoned that a more 
complex factor structure would indicate a 
reduction in composite halo effect. Table 3 
indicates the cumulative proportions of var- 
iance accounted for by successive numbers 
of principal components for the paired 
items on the standard questionnaire and the 
revised questionnaire. While a single factor 
accounts for 68% of the variance among 
professors on the standard items, it ac- 
counts for 60% of the variance among pro- 
fessors on the revised items. Again it can be 
seen that the difference is in favor of the 
revised questionnaire, but the magnitude of 
the difference casts doubt on its educational 
significance. 

The second research question explored 
the relationships among students’ ratings of 
professors on process variables and outcome 
variables. Those variables concerned with a 
professor’s teaching behaviors and his orga- 
nization and conduct of a class were termed 
process variables, and those variables con- 
cerned with students’ perceptions of the 
benefits they derived from a class were 
termed outcome variables. Based on the 
work of Hartley and Hogan (1972), it was 
hypothesized that there would be some de- 
gree of linear independence among process 
variables and outcome variables. From this 
it follows that professors rated most highly 
on process variables would not necessarily 
be rated most highly on outcome variables. 

To explore this question, a principal- 
components analysis of the entire revised 
questionnaire Was completed. It was as- 
sumed that all of the variation to be fac- 
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TABLE 4 
VARIABLE NUMBERS, FACTOR LOADINGS AND BRIEF DESCRIPTIONS OF VARIABLES LOADING 
Heavity on Five FACTORS EXTRACTED FROM A MATRIX OF STUDENT 

RATINGS OF PROFESSORS AND COURSES 


RICHARD M. JAEGER AND TOM D. FREIJO 


Description of variable 


Factor Variable number Loading 
1 25 -75 
24 74 
26 74 
30 .64 
31 57 
2 39 .86 
40 .85 
3 35 -87 
36 T 
4 28 .82 
27 -70 
31 .93 
32 .53 
29 .50 
5 38 .80 
37 72 


organization of course 
clarity of instruction 
relationship, materials to objectives 
instructor’s overall effectiveness f 
recommend to fellow student 


acquisition of new educational values 
instructor facilitates value acquisition ^ 


gain ability to relate past knowledge © 
instructor facilitates ability to relate pa 
knowledge 


enthusiasm of instructor - 
willingness and availability of instruete 
recommend to fellow student 
take another course from this instructo 
quality of evaluation procedures 


instructor facilitates comprehension Of 
general concepts 
learns to comprehend general concept 


Note. The variables shown are those with loadings in excess of .5. It was assumed that all va ia 


was contained in a common factor space. Five factors accounted for 73% of the total variance. 


tored was contained in a common factor 
space, and the varimax procedure was used 
to rotate the principal-components solution. 
A summary of results is contained in 
Table 4. 

The first factor was defined by 5 items 
with loadings of at least .5. Note that all of 
these items concern the process of instruc- 
tion—organization, clarity, sticking to ob- 
jectives, overall effectiveness, and an opera- 
tional judgement of overall quality. We 
were tempted to label this factor “judged 
managerial effectiveness of the professor". 
Factor 2 was defined by 2 related questions 
on students' perceptions of their acquisition 
of new educational values and the profes- 
sor’s role in facilitating that acquisition. 
Like Factor 2, Factor 3 was defined by stu- 
dents’ perceptions of their progress. Only 2 
items had loadings in excess of .5 on Factor 
3: students’ perceptions of their gain in 
ability to relate and integrate past knowl- 
edge, and their feelings that the professor 
facilitated this gain. Factor 4 can be la- 


beled a behavioral dimension. Three of 
5 items concerned the professors’ enth 
asm, willingness, availability, and fait 
in grading; the other items could be chal 
terized as operational endorsements of tl 
qualities, Finally, the fifth rotated fa 
was defined by a pair of items cont 
with the acquisition of general concepts 
the professor's facilitation of that aeg 
tion. The five rotated factors accoum 
73% of the variance on the 17 items. 
Our findings on the relationships au 
process and outcome variables were con 
ent with those of Hartley and He 
(1972). Process judgements and out 
judgements appear to define unique dit 
sions, since in no case did a process 
and an outcome item load heavily On 
same factor. Since the mutally orthog 
factors obtained were defined exclus! 
by a set of process variables or a S 
outcome variables, it follows that the 
iables themselves exhibit a substantia 
gree of linear independence. 


EVALUATION OF PROFESSORS 


A similar interpretation of the factor- 
analysis results leads to the conclusion that 
students perceive courses as contributing to 
their self-development in different ways. 
This follows since variables concerned with 
specific kinds of student benefits defined 
mutually orthogonal factors. It is reassur- 
ing to note that students view professors as 
important in contributing to their course- 
related development. 

In summary, then, our study shows that 
students’ ratings of professors are little af- 
fected by the rewording of items used here 
and that the typical rating questionnaire on 
the process of instruction should be supple- 
mented by a questionnaire containing out- 
come items. 
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PREDICTION OF PRODUCTIVITY FROM 
PERSONALITY TEST SCORES 


CHARLES F. ELTON: anp HARRIETT A. ROSE 


University of Kentucky 


This study investigated the relationship between the quantity of non- 
academic achievements and personality test scores. The total non- 
academic score—the dependent variable—was derived from nine non- 
academic achievement scales on the American College Test. The 
predietor variables consisted of 14 scale scores from the Omnibus 
Personality Inventory (OPI) and 1 academic aptitude measure. The 
relationship between the predictor and dependent variables was 
analyzed by a stepwise regression analysis for a sample of 505 men and 
496 women. In a replication, the number of students consisted of 
1,329 men and 1,046 women. The Social Extroversion, Estheticism, 
and Impulse Expression scale scores on the OPI were the three best 


predictors in each analysis. 


In their relatively brief history, the merit 
of nonacademic achievement or accomplish- 
ment scales has been examined by several 
investigators. The first published study 
using these scales was that of Holland 
(1961). He reported the intercorrelations be- 
tween 72 predictors and 3 criterion measures 
in a sample of National Merit finalists. The 
predictors consisted of personal, demo- 
graphic, and parental variables, and the 
criterion measures consisted of 1 measure of 
academic talent (high school grades) and 2 
measures of nonacademic talent (scientific 
and artistic performance). Examples of 
items in these latter 2 measures are as fol- 
lows: gave an original paper at a scientific 
meeting sponsored by a professional society 
and won literary award or prize for creative 
writing, respectively. Students were asked to 
check those items which applied to them. 
Although Holland noted the difficulties in- 
herent in defining creativity, he labeled the 
scientific and artistic scales as measures of 
creative performance. A negligible correla- 
tion was found between academic and crea- 
tivity scores. Similar correlations were re- 
ported between creativity and personality 
scores, the latter measured by Gough's Dif- 


* Requests for reprints should be sent to Charles 
F. Elton, Room 111, Dickey Hall, University of 
Kentucky, Lexington, Kentucky 40506. 
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ferential Reaction Scale and tests 
from Barron's Complexity-Simplieity, 
dependence of Judgment and Origi 
Median correlations were reported as 
and .14 for boys and girls, respectively. 

In an extension of this study, U 
graduate achievements in scientific, 
leadership, and scholastic areas were 
dicted from data obtained in the senior? 
of high school for four independent 
of National Merit finalists over one-, 
three-, and four-year college intervals 
land & Astin, 1962). It was found 
best single predictor of each of 
kinds of college achievement was 
achievement in high school. Among th 
dictors were personality test scores 0D! 
from the Sixteen Personality Factor | 
tionnaire, California Psychological 
tory (CPI), and an early form of the 
bus Personality Inventory (OPI 
median correlations reported betwe 
predictors and achievements in are 
leadership, art, and science were - 
and .02, respectively. 

In another related study, Hollant 
Nichols (1964) employed multiple T 
sion analysis to predict college 8€ 
ments including grades, leadership, 
dramatic arts, literature, music, ! 
from a high school assessment of int 
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goals, activities, and aptitudes for a large 
sample of high-aptitude students. Again, 
similar nonacademic accomplishments in 
high school were found to be significant pre- 
dictors of college accomplishment. 

Subsequently, the relationship between 
nonacademic and academic accomplishment 
was studied by Holland and Richards 
(1965), Richards, Holland, and Lutz (1967), 
Werts (1967), Elton and Shevel (1969), 
Wallach and Wing (1969), and Wing and 
Wallach (1971). Nonacademic talent was 
found to be independent of academic talent 
for a variety of student populations in the 
studies reported by Holland and his col- 
leagues. Werts (1967), however, reported 
finding a relationship between nonacademic 
and academic achievements and suggested 
that the lack of an existing relationship in 
previous studies might have been masked by 
the use of the correlation coefficient, as the 
principal analytic technique. Other investi- 
gators later supported the independence be- 
tween these two types of accomplishments 
(Elton & Shevel, 1969; Wallach & Wing, 
1969). 

The relationship between personality 
variables and nonacademie achievements 
has been reported as negligible (Holland, 
1961; Holland & Astin, 1962). These results, 
however, are subject to two possible qualifi- 
cations. First, marked changes have oc- 
curred in the nonacademie achievement 
scales over the past ten years, that is, items 
have been altered in some scales and the 
number of scales has been expanded. Second, 
personality traits may predict quantity of 
nonacademie accomplishments, if it is as- 
sumed that nonacademic achievement scales 
are intercorrelated. The best evidence for 
this assumption is derived from an inspec- 
tion of the intercorrelations between non- 
academic achievements in the areas of 
science, leadership, art, music, speech, and 
writing which suggests that nonacademic 
talent is more general than specific in its 
Occurrence. For example, the median cor- 
Telation among nonacademic achievements 
for males is .37 (Holland & Richards, 1965, 
Table 2). The purpose of this study is to 
investigate the relationship between the 
quantity of nonacademic achievements and 
personality test scores. 


METHOD 


Multiple regression analysis was employed to 
establish the relationship between the predictor 
variables—personality and ability measures—and 
the dependent variable—total number of self-re- 
ported nonacademic accomplishments. 

The predictors consisted of 14 scale scores on 
the OPI and a measure of academic aptitude. 


OPI Tests 


The OPI, Form F, contains 385 items designed 
to assess selected attitudes, values, and interests of 
college students, chiefly relevant in the areas of 
normal ego-functioning and intellectual activity. 
The specific scales of the OPI include the following 
measures: 


1. Thinking Introversion (TI). Persons scor- 
ing high on this scale are characterized by a 
liking for reflective thought and academic ac- 
tivities. 

2. Theoretical Orientation (TO). High scor- 
ers indicate a preference for dealing with the- 
oretical concerns and problems and for using 
the scientific method in thinking. 

3. Estheticism (Es). High scorers endorse 
statements indicating diverse interests in ar- 
tistic matters and activities and a high level of 
sensitivity and response to esthetic stimula- 
tion. 

4. Complexity (Co). High scorers are tol- 
erant of ambiguities and uncertainties; they 
are fond of novel situations and ideas. 

5. Autonomy (Au). High scorers show a 
tendency to be independent of authority as 
traditionally imposed through social institu- 
tions. 

6. Religious Orientation (RO). High scorers 
are skeptical of conventional religious beliefs 
and practices and tend to reject most of them, 
especially those that are orthodox or funda- 
mentalistic in nature. 

7. Social Extroversion (SE). High scorers 
display a strong interest in being with people, 
and they seek social activities and gain satis- 
faction from them. i 

8. Impulse Expression (IE). High scorers 
have an active imagination, value sensual re- 
actions and feelings, and are aggressive. 

9. Personal Integration (PI). High scorers 
admit to few attitudes and behaviors that char- 
acterize socially alienated or emotionally dis- 
turbed persons. p 

10. Anie Level (AL). High scorers deny 
that they have feelings or symptoms of anxi- 
ety, and they do not admit to being nervous or 
vor (Am). High scorers are affili- 


ative persons who are trusting and ethical in 


their relations with others. y 
2. Practical Outlook (PO). High scorers 


are interested in practical, applied activities 
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and tend to value material possessions and 
concrete accomplishments. 

13. Masculinity-Femininity (MF). High 
scorers (masculine) deny interests in esthetic 
matters, and they admit to few adjustment 
problems, 

14. Response Bias (RB), High scorers re- 
spond in a manner similar to a group of stu- 
dents who were explicitly asked to make a 
good impression. 


The reliabilities of these scales range from .67 
to .89; a complete description of their development 
as well as validity data may be found in the OPI 
Manual (Heist and Yonge, 1968). 


Academic Aptitude 


The American College Test (ACT) Composite 
score was used as a predictor also. This variable 
was included because the relationship between abil- 
ity and the recently added nonacademic achieve- 
ment scales of athletics, practical skills, and work 
experience has not been investigated. 

The dependent variable was the total number of 
self-reported, nonacademie accomplishments taken 
from the Student Profile section of the ACT. These 
self-reported or claimed accomplishments are 
grouped in nine scales each with seven items. The 
student claims to have accomplished a given ac- 
tivity by responding either “yes” or “no” to that 
activity. 

Maxey and Ormsby (1971) investigated the va- 
lidity of 28 self-reported achievement items in 
large samples of males and females. Items were 
selected from the nonacademie scales which could 
be verified from either high school records or by 
high school personnel. For men, item agreement 
between student self-report or claim and school re- 
port ranged from a low of 70.5% for the item 

actively campaigned to elect another student" to 

a high of 97.7% for the item “named to an all state 
team.” These same two items defined the extreme 
percentage agreements among women; for example, 
the low was 67.7% while the high was 99.5%. These 
investigators interpreted their findings as indicating 
excellent agreement between student claim and 
school report for items in which schools could be ex- 
pected to keep accurate records, Some items, how- 
ever, reflected activities in which either the mol 
did not keep good records or it was difficult for 
school personnel to ascertain whether or not the 
student engaged in a specific activity. It should be 
pointed out that Maxey and Ormsby examined the 
validity of only 28 items, hence the validity of 35 
items is unknown. 

The “yes” responses were summed over the 9 
scales to provide a total, self-reported nonaeademic 
accomplishment score. The use of a total nonaca- 
demic achievement score as the dependent variable 
assumed that the 63 self-reported items in the 9 
scales could be treated as 1 nonacademic achieve- 
ment scale. The justification for this procedure fol- 
lows. First, each scale consisted of 7 items ranging 
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from common and less important accomplishn 
to rarer and more important accomplishments 
the absence of any published data on the dej 
intercorrelation between the 63 items, indepen 
between the items was assumed. Second, the 
lations between the 9 separate scale scores and 
total number of self-reported achievements were 
amined. The median correlation for men was 
for women it was .49, 
Sample items from the nine self-reported ng 
academic achievement scales follow : 


Leadership: appointed to a student offi 
organized a school political group or ca 
paign; participated in a nonschool political 
campaign. 

Music: composed music; performed with? 
professional musical group (orchestra, ba 
choral group); performed in a school musiea 


group. "V 
Speech: placed first, second, or third inj 
regional or state speech or debate contest; e 


tered a school speech or debate contes! 
leads in high school or churchsponsored play; 

Art: finished a work of art (painting, cera 
ics, sculpture, etc.) on my own (not as pi 
of a course) ; exhibited a work of art in a stai 
wide or regional show; had photographs, draw 
ings, or other artwork published in a pub 
newspaper or magazine. 


an original but unpublished piece of creativ 
writing on my own (not as part of a course 
won literary award or prize for creative wil 
ing. 

Science: did an independent scientific exp 
iment (not as part of a course); won a prize € 
award (of any kind) for scientific work | 
study; placed first, second, or third in a m 
gional or state science contest. 

Athletics: participated in one or more 
sity athletic team events (football, basketbal 
baseball, etc.) while attending high schoo 
earned a varsity letter in one or more sports 
high school; received all-city, league, count 
or state team award (including honorable mel 
tion). 

Work experience: worked regularly for pa 
earned one or more raises or promotions 5 
cause of good work; did a job most people m 
age couldn't do. 1 

Practical skills: paid a bill, purchased som 
thing with a money order, or balanced a chee 
book; made a useful item by sewing; can? 
pair an automobile if I have the right equi 
ment. 


The reliabilities of the first six scales range 
S1 to 89 over a one-month interval and from 
to 89 over a three-month interval (Technica! 
port for the ACT Assessment Program, 1973). 

Freshman OPI scores for 505 men ane 
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women were chosen for analysis. These students 
had entered as freshmen in 1971 and had persisted 
in college for at least five semesters. A stepwise 
multiple regression analysis was used, separately 
for men and women, to predict the total nonaca- 
demic accomplishment score. 

The analyses described above were replicated 
with the entire entering freshman class of 1970 for 
whom all test data were available (ns = 1,329 men 
and 1,046 women). 

In the regression analysis, the 01 level of sig- 
nificance was chosen for the inclusion of additional 
variables which reduced the residual variance. 


RESULTS 


Tables 1 and 2 show the means, standard 
deviations, and intercorrelations between 
the independent and dependent variables. 
Results for men are above the diagonal, re- 
sults for women are below. The correlations 
in both tables range from low to moderate. 
The highest correlation between an OPI 
scale and ability was found on the measure 
of Autonomy; this existed for both sexes in 
both tables. It is worth noting that most 
OPI scales correlate higher with the total 
nonacademie accomplishment score than 
does the measure of ability. The higher 
mean self-reported nonacademie accom- 
plishment score found in Table 1 compared 
to that found in Table 2 may be a result of 
the different student populations, that is, 
persisting versus entering students. 

The results of the regression analysis pre- 
dicting total self-reported nonacademie ac- 
complishments for the 1971 persisting fresh- 
men are shown in Table 3; similar data are 
provided in Table 4 for the 1970 entering 
freshmen. In each analysis, the three best 
predictors of total self-reported non- 
academic achievements are Social Extrover- 
sion, Estheticism, and Impulse Expression. 
The multiple correlations for persisting 
freshmen are appreciably higher than they 
are for entering freshmen. A question sug- 
gested by this finding is, What is the re- 
lationship between self-reported nonaca- 
demic accomplishments and persistence in 
college? 


Discussion 
Social Extroversion, Estheticism, and Im- 
Pulse Expression are the three best predic- 
tors of the quantity of self-reported accom- 
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plishments for both sexes and for both enter- 
ing and persisting freshmen. On the basis of 
these personality scales, the student with a 
large number of self-reported accomplish- 
ments may be described as follows: likes 
other people and derives stimulation from 
contact with others; possesses an interest in 
art, painting, sculpture, music, and litera- 
ture; and expresses his impulses readily 
either through conscious thought or in overt 
action. 

MacKinnon (1968) suggested that com- 
petencies and self-reported nonacademic 
achievement items such as those used as the 
dependent variable in this study are more 
promising predictors of future creative per- 
formance than are measures of intelligence. 
If, as MacKinnon suggests, nonacademic ac- 
complishments are potential indicators of 
future creative performance, the findings in 
this study are supported by those reported 
by Parloff, Datta, Kleman, and Handlon 
(1968). These investigators administered 
the California Psychological Inventory to a 
sample of adolescents judged to be creative 
on the basis of their Science Talent Search 
projects. Test results on the California Psy- 
chological Inventory were obtained also for 
samples of creative mathematicians, re- 
search scientists, writers, and architects. 
Separate factor analyses of the California 
Psychological Inventory for the adolescent 
and adult samples revealed similar factor 
structures. Both the creative adults and the 
more creative adolescents, however, were 
distinguished by factors labeled adaptive 
autonomy and assertive self-assurance. 
Among their conclusions, these investigators 
noted, “Creative performance among the 
adolescents appears to be facilitated rather 
than inhibited by a measure of social skills 
and self-control [p. 548].” While the mea- 
sures of nonacademie accomplishment used 
in this study cannot be equated with the 
quality or social importance of the measure 
of creativity used by Parloff et al. (1968), 
the correspondence between their descrip- 
tion of the more creative adolescent and the 
description in this study of the high non- 
academic achiever is apparent. 

The congruence between the Parloff et al. 
(1968) findings and those in this study may 
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TABLE 3 
Bera WEIGHTS AND MULTIPLE CORRELATIONS FOR 
Prepicrinc TorAL NONACADEMIC 
ACCOMPLISHMENT AMONG 1971 
PERSISTING FRESHMEN 


Men Women 
(n = 505) (n = 496) 
Predictor 

B R 
Social Extroversion .518| .593 
Estheticism .223| .647| . 
Impulse Expression .131| .654| 
"Theoretical Orientation .123| .660| 
Autonomy — .106| .667 
American College Test 

Composite 


be explained partially by a study reported 
by Skager, Schultz, and Klein (1965) on 
self-reported nonacademie accomplishments 
as measures of creativity. These investi- 
gators developed items similar in content 
and emphasis to those used by Holland 
(1961) in his original study. Rare as well as 
relatively common activities were included 
in their checklist of accomplishments. In 
addition, students—142 males from a state 
university and 150 males from a selective 
technological institution— were asked to 
deseribe briefly the activity checked. Quan- 
tity and quality scores for each subject were 
obtained. Raters examined each activity de- 
scription and designated which was the most 
significant achievement. These were then 
sorted by five judges, including three psy- 
chologists, one physicist, and one artist, into 
one of six categories of quality. Instructions 
to the judges were as follows: (a) the 
achievement implies originality as opposed 
to replication; (b) it is self-initiated rather 
than organized or planned by someone else; 
(c) it involves official recognition and/or 
public performance; and (d) there is evi- 
dence of tangible product over learning or 
passive curiosity. The correlations found be- 
tween quantity and quality were .44 and .29 
for the state university and technological 
institution samples, respectively. The de- 
pendent variable in this study. was. one 
solely of quantity, but the moderate cor- 
relation between quantity and quality sug- 
gests that students who report large numbers 
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of nonacademic accomplishments may also 
be reporting a few rare or unusual accom- 
plishments. 

Nonacademie accomplishments have 
been designated as measures of creative per- 
formance (Holland, 1961; Skager et al, 
1965; Taylor & Ellison, 1972). The impor- 
tance of the criterion problem in measuring 
creativity has been emphasized by Brogden 
and Sprecher (1964) who stated, 


But is the creative person the same as the produc- 
tive person? Minimal productivity is probably 
necessary before a person’s creativity can be identi- 
fied and recognized by society. There may also 
prove to be at least a moderately high correlation 
between productivity and creativity [p. 156]. 


In view of the fact that the criterion prob- 
lem has not been resolved, it would appear 
desirable to designate self-reported non- 
academic accomplishment scales as mea- 
sures of productivity rather than as 
measures of creativity. Thus far, the re- 
lationship between college nonacademie 
achievements and postcollege measures of 
creativity has not been investigated. It will 
be necessary to establish this link before 
nonacademie achievements in high school 
or college may be considered as potential 
predictors of creativity. Dellas and Gaier 
(1970) state in their review of research on 
creativity, 

It also appears that there is a necessity to develop 
creativity measures based on personality study 


rather than task performance. . . . That biographi- 
eal items and past achievement have been rat 


TABLE 4 
Bera Wrronrs anp MULTIPLE CORRELATION 
ron Prepictinc ToraL NONACADEMIC 
ACCOMPLISHMENT AMONG 1970 
ENTERING FRESHMEN 


Women 

(Moo) | = 10 

Predictor — 

R 

8 R B E. 

Estheticism .239| .365| -181 a 

Social Extroversion .282| .469| -281 jo 
Impulse Expression .154| .483| -253| 
Response Bias —.093| .491| .187|- 

Personal Integration .145| .495 sil 

Religious Orientation -092 sit 
American College Test 072)- 

Composite 
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as the most efficient predictors (Taylor and Hol- 
land, 1964) does not mean investigators have no 
further work in this direction [p. 70]. 


- The findings of this study indicate a 
"relationship between self-reported task per- 
formance and personality test scores. Un- 
answered, however, is the relationship be- 
tween measures of task performance and 
creativity. 
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Ninety-six low-socioeconomic-status students in Grades 5, 8, and 11 
were presented two classification tasks related to mathematics. Task 1 
required students to state how geometric forms presented successively 
were alike and different. The bases on which students indicated like- 
nesses and differences were classified as perceptible, attribute, nominal, 
or fiat. Task 2 was a free-sorting exercise in which the students ar- 
ranged 26 geometric forms successively into seven groups. Unlike mid- 
dle- and high-socioeconomic-status students, the low-socioeconomic- 
status students classified significantly more on the basis of perceptible 
likenesses and differences among concept examples than on the more 
mature bases of the defining attributes and the names of concepts. 
Low socioeconomic status and lack of instruction were related to the 
patterns of conceptual development. 


Bruner, Goodnow, and Austin (1956) and 
later researchers have shown that individu- 
als learn about and deal with their complex 
world by rendering things equivalent; that 
is, the individual learns to treat many 
things as equivalent by identifying certain 
common properties while ignoring other dis- 
tinguishable characteristics. The treating of 
things as equivalent, usually referred to as 
classificatory behavior or equivalence for- 
mation, cannot be undervalued for its prac- 
tical significance in educational and other 
settings. Understanding the bases by which 
maturing individuals classify things is of 
significance to the science of human learn- 
ing and also to the design of classroom in- 
structional practices. 


According to Klausmeier, Ghatala, and 
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Frayer (1974), the lowest level of classi- 
fication involves generalizing that at least 
two different objects, events, or processes 
are alike in some way and discriminating 
that these same two things are different 
from other things. The highest, exhaustive 
level of classification enables the individual 
to identify the things he encounters as ex- 
amples and nonexamples of any particular 
concept he has formed. Children learn to 
classify from the lowest to the highest lev- 
els from early childhood through adoles- | 
cence. 

In a somewhat analogous manner and 
from a developmental point of view, Piage- 
tian theory holds that “classification 1m- 
plies a relation of resemblance betwee? 
members of the same class, and one of on 
similarity between members of differen 
classes [Inhelder & Piaget, 1964, p. 5 
Classificatory operations arise throug 
structural development and represent E 
important aspect in the attainment of con 
erete operational thought (Flavell, 1963). : 

More explicitly than Piaget, other P 
chologists have emphasized the importan? 
of experiential and cultural influences 
learning to classify (Bruner et al., m 
Evans & Segall, 1969; Klausmeier et 9" 
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1974; Schmidt & Nzimande, 1970; Such- 
man, 1966). In particular, Maccoby and 
Modiano (1966) and Greenfield, Reich, and 
Olver (1966) have reported substantial dif- 
ferences in the classificatory behaviors of 
groups drawn from various cultures but 
have noted also that differences among 
these groups tend to decrease with increas- 
ing educational experience and urban influ- 
ence. 

The present experiment was designed to 
learn more about the classificatory behav- 
iors of low-socioeconomic-status children 
from three age groups. It was carried out as 
the last of two experiments to ascertain 
whether the changing bases of classificatory 
behavior in low-socioeconomic-status chil- 
dren were comparable to those reported in 
middle- and high-socioeconomic-status chil- 
dren by Wiviott (1970). 

Wiviott (1970) patterned the first experi- 
ment after studies by Olver (1961) and 
Rigney (1962). Wiviott devised two tasks 
for the experiment. The first task consisted 
of a successive presentation of instances of 
eight geometric concepts-square, rectangle; 


' rhombus, parallelogram, quadrilateral, tri- 


angle, circle, and cube. The instances were 
presented either figurally with line draw- 
ings or verbally with the names of the eight 
concepts. One half of the subjects were pre- 
sented eight cards, each with a drawing of a 
concept instance printed on the card, while 
the other half were presented eight cards, 
each with the typewritten name of the con- 
cept. In the experiment, subjects were strat- 
ified according to three grade levels—5, 8, 
and 11—and two levels of mathematics 
achievement—above and below the median 
score of the grade group—so that one half 
the subjects high in mathematics achieve- 
ment at each grade level received each 
treatment as also did one half the subjects 
low in achievement. 

In accordance with Olver’s (1961) meth- 
ods, the subject. was shown the first two 
items, square and rectangle, and was asked 
how they were alike. Next, rhombus was 
Presented and the subject was then asked 
how this differed from the first two items 
and how all three were alike. This contin- 
ued until all items were administered, with 
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cube representing the final contrast item. 
On this item, the subject was asked only 
how the figure was different from the earlier 
ones. 

Rigney’s (1962) methods served as a pro- 
totype for Wiviott’s (1970) second task. 
The same concepts as in the first task were 
used with the exception of cube. A 26-card 
array of drawings was constructed, using 
instances of the concepts that varied in size 
and orientation, The subject was instructed 
to form successive groups of drawings that 
were alike in some way and was then asked 
how they were alike. After the subject com- 
pleted a grouping and explained the ration- 
ale underlying it, the cards were replaced in 
their original array and the subject was 
asked to form another different group. This 
procedure continued until the subject had 
formed seven groups. 

Responses of the subjects on Tasks 1 and 
2 were recorded during the experiment. Sub- 
sequently each response was categorized as 
perceptible, attribute, nominal, or fiat. 

Wiviott (1970) reported that the method 
of presenting the instances, the grade level 
of the subjects, and the achievement level 
were all related significantly to the bases of 
classification employed by the subjects. 
Subjects who were presented the examples 
figurally gave more perceptible responses 
than those who were given the examples 
verbally. An increase in grade level from 
fifth to eighth to eleventh was accompanied 
with a decrease in the perceptible basis of 
classification and an increase in the attri- 
bute and nominal bases. This result agrees 
with the conclusions of Olver (1961) and 
Rigney (1962). Wiviott also found that 
higher achievers in mathematics made 
fewer perceptible responses than lower 
achievers. Thus, Wiviott supported the ear- ' 
lier conclusions of Olver and Rigney by 
adding very important information about 
grade level and achievement level, which 
are of importance in educational settings. 

The present study was undertaken to as- 
certain whether the bases of classification 
found by Wiviott in middle- and high-socio- 
economic-status children also held for low- 
socioeconomie-status children. The grade 
level of the subjects, the experimental ma- 
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terials, and the experimental procedures 
were kept identical to Wiviott’s experiment 
so that the performance of the two socioec- 
onomie groups could be related. The present 
experiment is now explained more fully. 


METHOD 


Subjects 


Ninety-six low-socioeconomic-status students 
from various schools in a city of 36,000 population 
served as subjects in this study. They were enrolled 
in the fifth, eighth, and eleventh grades—32 subjects 
in each grade. The total population of low-socio- 
economic-status subjects enrolled in these grades 
was initially identified by the school principals in 
consultation with the school counselors. Between 60 
and 80 children at each grade level were identified, 
Next, the occupation of the head of the household 
in the homes of each of the children was rated by the 
first author of this paper using the Warner (1960) 7- 
point occupation scale. Students assigned occu- 
pational ratings in. Categories 6 and 7 were judged to 
be suitable for the study; the few who received 
ratings of 5 were not included. The 32 subjects at 
each grade level who participated in this study were 
selected randomly from this low-socioeconomic- 
status population of students. 

It might be well to note that Wiviott conducted 
her experiment in a different city of the same 
state. Eighty-five percent of the parents of her 
subjects were in occupational Categories 1, 2, and 

, and 15% were in Categories 4 and 5. Thus there 
was no overlapping in the occupational categories 
of the two groups of subjects. 

In the present experiment, the subjects were 
randomly assigned within each grade level to one 
of two methods of presentation—figural or verbal— 
with an equal number of male and female sub- 
jects being assigned to each method. Thus, there 
were 16 subjects in each possible combination of 
grade level (5, 8, and 11) and method of presenta- 
tion (figural and verbal) 


; Experimental Materials 


The materials used in this experiment were the 
same as reported earlier in the study of Wiviott 
(1970). The cards used in Task 1 represented in- 
stances of eight geometric concepts—square, rec- 
tangle, rhombus, parallelogram, quadrilateral. tri- 
angle, circle, and cube. The instances consisted of 
either line drawings or printed names of the con- 
cepts. The 26 cards used in Task 2 consisted of 
drawings, using instances of the eight concepts 
that varied in size and orientation $ 


Procedure 


Tasks 1 and 2 were administered consecutively 
and individually to subjects. About 20 to 30 min- 
utes were required for each administration. Testing 
was conducted in private rooms in each of the 
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schools. As in Wiviott’s experiment, 
subjects in Task 1 were recorded on a 
while in Task 2, a verbatim written reco 
kept of the cards selected for each sort 
responses subjects gave regarding the ba 
sorting. 

Procedures for the first task were patternes 
the studies of Olver (1961) and Wiviott ( 
The first two cards—square and rectan 
placed on a table before the subject, and 
ject was asked to explain how the two we 
The third card, rhombus, was then present 
the subject was asked to explain how it w 
ent from the first two and how they were 
This procedure continued until all cards 
ministered, with cube functioning solely 
trast item, In all, there were six questions: 
likenesses and six questions involving diff 
among the concept instances. For the 3 
method of presentation, the names of c 
were routinely pronounced for the subjee 
the figural method of presentation, the na 
withheld, 

Immediately following Task 1, the 
array of drawings for Task 2 was given. 
sort exercise was modeled after the mel 
Rigney (1962) and Wiviott (1970). The car 
laid out on a table before the subject, Bae 
ject was directed to examine the cards clos 
form a group which seemed alike in some Wal 
to tell the basis for grouping. After the re 
was recorded, the subject was requested | 
another group, continuing the procedure un 
groups were formed. 

Tn addition to the preceding tasks, ea 
was also given a test to determine his 
of the geometric forms used in the tasks 
formation was secured concerning whethel 
ject had received instruction in geometry. — 


Treatment of the Data 


All responses in Task 1 were categol 
Wiviott's (1970) four bases of classifica! 
ceptible, attribute, nominal, and subject 
sponses concerning similarities were tab 
arately from responses concerning diffe! 
Task 2, the fiat category was not used, 
viott noted that it rarely occurred in th 
exercise. A system for categorizing Task 1 
responses can be seen as follows: T 


1. Perceptible: The child may render 
equivalent on the basis of immediate phen 
qualities, such as color, size, shape, or on 
of position in time or space. 

Example: They are alike because. 

both black figures on white ca 
They are both printed in blac 
The lines are straight, not sla 
"They are tilted to the right. 
This one is round. 

One is longer than the other. - 
They are diamond-shaped. - 
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2. Attribute: The child renders the items equiv- 
alent or diverse by naming a specific attribute of 
the concept. 

Example: They all have four sides. 

They are closed figures. 
They are plane figures. 
They are made of line segments. 


3. Nominal: The child may group items by giv- 
ing a name that exists ready-made in the language. 
A supraordinate concept name is used as the basis 
of grouping. 

Example: They are all parallelograms. 

They are diamonds. 

Both the square and the rectangle 
are rectangles. 

They are all geometric figures. 


4. Subject, Fiat: The child may merely state 
that the items are alike or are the same without 
giving any further information as to the basis of 
his grouping, even when he is prodded. 

Example: They are alike. 

They are just different. 


The data were analyzed in the framework of a 
2 X 3 multivariate design with method of presen- 
tation (figural or verbal) included as the independ- 
ent variable and grade level (5, 8, or 11) included 
as a stratifying variable. The total number of re- 
sponses in each classification category for Tasks 1 
and 2 represented the dependent variables for this 
experiment. Three multivariate analyses of vari- 
ance were performed for each task and for the 
two tasks combined: one for testing differences in 
types of classificatory responses elicited (percepti- 
ble, attribute, nominal), another for testing effects 
due to grade level (5, 8, and 11), and a third for 
testing effects due to method of presentation (fig- 
ural and verbal). The fiat classification category 
n Task 1 was analysed separately from the main 
ata. 
. Mean scores were computed for the subjects who 
indicated they had received instruction in geometry 
and for those who indicated they had received no 
instruction. Mean scores were also computed on 
the geometry test for the low-socioeconomic-status 
children in the present experiment and the high- 
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socioeconomic-status children of Wiviott’s experi- 
ment. 


RESULTS 


An interrater reliability check was per- 
formed on a random sample of 24 protocols, 
4 subjects from each cell. The protocols 
were scored by 2 independent raters using 
the scoring format as outlined. The percent- 
age of agreement between the two ratings 
was 88.2% for response data on Task 1 and 
89.9% for response data on Task 2. 

Table 1 gives the mean number of re- 
sponses in each classification category as a 
function of grade level and method of pres- 
entation for Tasks 1 and 2 separately and 
for Tasks 1 and 2 combined. Statistically 
significant differences were obtained among 
the categories of responses when collapsing 
across all subgroups. In Task 1, the mean 
number of perceptible responses (4.90) was 
significantly higher than attribute responses 
(4.57), which in turn was higher than nomi- 
nal responses (.87) (F = 150.82, df = 2/89, 
p < .0001). For Task 2, the mean number 
of perceptible responses (3.30) was signifi- 
cantly higher than attribute responses 
(1.39), while nominal responses (2.32) were 
higher than attribute responses (F = 22.17, 
df = 2/89, p < .0001). The mean number 
of perceptible responses (8.19) for Tasks 1 
and 2 combined was significantly higher 
than attribute responses (5.96), and the 
mean number of attribute responses was 
higher than nominal responses (2.16) (F = 
63.34, df = 2/89, p < .0001). 

Grade level was found to be not signifi- 
cantly related to the bases of classification 
employed by subjects, either in Task 1 (F 


TABLE 1 
Mean Number or Responses IN EACH CLASSIFICATION CATEGORY BY SUBGROUPS FOR TAsKs 1 AND 2 


AND FOR TASKS 


1 AND 2 COMBINED 


Tasks 1 and 2 (combined) 


Task 1 Task 2 

à Subgroup - 
Perceptible| Attribute Nominal Fiat |Perceptible| Attribute Nominal [Perceptible] Attribute | Nominal 

Grade 5 marea E 0c DEM UC 
Grade 8 442 | 547 | 56 | 139 | 2.84 | 1.69 | 2d? | 7-00 | 7.16 | 3.08 
Grade 11 473 | $34 | 119 | 243 | 3.25 | ats | 269 | 8.03 | $.05 | 3-61 
ictorial ayal | a0 |) mei aia | 3:46 MT S17 | 6.00 | 2.08 
erbal 4.98 | 4.54 792 | 1.56 | 3.13 | 1.85 | 2.52 | 8.10 | 5.90 | 3.4 
. All subjects 4.90 | 4.57 ‘7 | 1.68 | 3.30 | 1.39 | 2.32 | 8.19 | 5.96 | 2.16 
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= 1.81, df = 6/176, p < .09) or in Task 2 
(F = 1.08, df = 6/176, p < .15) when the 
tasks were treated independently. However, 
grade level was found to be significant 
when data from both tasks were combined. 
Univariate F statistics showed that the per- 
ceptible and attribute bases of classification 
were significantly different with respect to 
grade level for the perceptible category (F 
= 3.84, df = 2/90, p < .03) and for the 
attribute category (F = 3.12, df = 2/90, p 
< .05). Fifth graders classified relatively 
more than eighth graders on perceptible 
bases (fifth = 9.31, eighth = 7.06), whereas 
eighth graders classified relatively more 
than fifth graders on attribute bases (fifth 
= 5.63, eighth = 7.16). Surprisingly, how- 
ever, fewer attribute and more perceptible 
responses were given by the eleventh grad- 
ers as compared to the eighth graders (elev- 
enth: perceptible = 8.03, attribute = 5.06). 
This anomalous finding is examined later in 
the Discussion section of this paper. The 
mean number of nominal responses in- 
ereased with grade level although not sig- 
nificantly (fifth = 2.72, eighth = 3.03, 
eleventh = 3.81). 

The subject-fiat category of classificatory 
responses in Task 1 was analyzed sepa- 
rately. This category was used to designate 
subjects’ responses which did not fit into one 
of the other three categories. (Subjects would 
customarily respond with the words “I 
don’t know.”) The mean number of subject- 
fiat responses was not significantly different 
for the three grade levels or for the two 
methods of presentation. 

Method of presentation was found to be 
not significant in Task 1, apparently be- 
cause of the preponderance of perceptible 
responses. Therefore, different from Wi- 
viott’s results, the classifying behaviors of 
subjects given figural stimuli and verbal 
stimuli were relatively the same. 

Last, no significant interactions in the 
bases of classifying occurred between grade 
level and method of presentation. 

The most interesting findings from this 
study are that low-socioeconomic-status 
children classified primarily on perceptible 
bases and that eleventh graders used the 
perceptible basis as much as the eighth 
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graders. Information concerning whethe 
the children had received instruction deal. 
ing with geometric forms was considered in 
light of these findings. 

At the end of the testing sessions, each 
subject was asked whether he had received 
any prior classroom instruction dealing 
with the geometric concepts employed in* 
this study. At the fifth-grade level, 25% of 
the subjects reported having prior instruo 
tion; at the eighth-grade level, 41% said 
they had; but at the eleventh-grade level, 
no subject reported having any prior ir 
struction. School records confirmed what 
was reported in the questionnaires by sub- 
jects. No eleventh-grade subject in this 
study had previous instruction in geometry 
according to school records. Currieulum 
consultants for the school district als 
stated that the fifth- and eighth-grade sub- 
jects, depending on the school attended, 
were affected by an upgrading in the ele 
mentary school mathematics program 
which in recent years has included geomet 
ric concepts. Eleventh-grade subjects welt 
unaffected by this curriculum change. In 
contrast to this lack of instruction, all elev 
enth-grade subjects participating in tht 
study by Wiviott had prior geometry i 
struction. 

The categorizing responses of the subjecti 
who had and had not received instruction 
were analyzed. Subjects who indicated pre 
vious instruction performed differently 0? 
both tasks than subjects who did not have 
this experience. Perceptible responses wert: 
fewer, while attribute responses we 
greater for subjects with a geometry back- 
ground (geometry: perceptible = 5.62, ac 
tribute = 9.15; no geometry: perceptible 2 
6.11, attribute = 7.89). ‘ 

Subjects in both experiments were adm 
istered the same test of geometric concep ! 
This test was constructed according À 
paradigm outlined by Frayer, Fredrick, am 
Klausmeier (1969) and consisted of j 
items designed to test mastery of the pu 
cepts. Results of the concept-mastery dr 
showed much higher mean scores for i4 
high-socioeconomic-status children 8$ i 
pared to the low-socioeconomic-status Ci 
dren at each of the three grade levels 
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socioeconomic status: fifth = 33.3, eighth 
= 37.9, eleventh = 47.2; low socioeconomic 
status: fifth = 21.4, eighth = 28.8, eleventh 


= 28.4). These mean scores are consistent 


with the patterns of classificatory behaviors 
reported on the two tasks and suggest that 
the various bases of classifying are directly 
related to the level of concept mastery; that 
is, mastery increases as the attribute and 
nominal bases are used and remains low 
and constant when the perceptible basis of 
classifying is employed. 


Discussion 


The present experiment was undertaken 
as the second in a series to ascertain 


whether the classificatory behaviors of low- 


| Socioeconomie-status 


children were the 
same as those of middle- and high-socioeco- 
nomic-status children. One important find- 
ing of the two studies concerns the more 
immature bases of classification used by the 
low-socioeconomic-status children who 
showed a greater mean number of percepti- 
ble responses than the high-socioeconomic- 
status children at all three grade levels (low 
socioeconomic status: fifth = 9.31, eighth 
= 7.06, eleventh = 8.03; high socioeco- 
nomic status: fifth = 5.56, eighth = 4.00, 
eleventh = 2.59). The low-socioeconomic- 
status children also employed fewer attri- 
bute responses than the high-socioeconomic- 
status children (low socioeconomic status: 
fifth = 5.63, eighth 7.16, eleventh = 
5.06; high socioeconomic status: fifth = 
7.38, eighth = 8.25, eleventh = 9.00) and 
gave fewer nominal responses (low socioec- 
Onomie status: fifth = 2.72, eighth = 3.03, 
eleventh = 3.81; high socioeconomic status: 
fifth = 4.19, eighth = 5.13, eleventh = 
1.09). 

Another unexpected outcome of the pres- 
ent study was the absence of developmental 
change. The expected decrease in the use of 
the perceptible basis of classification and 
the related increase in the use of the attri- 
bute basis of classification did occur from 
Grades 5 to 8. The eleventh-grade group, 
however, did not follow this pattern. 

_ Relating these experimental findings to 
instruction and to lack of instruction indi- 


, cated that the low-socioeconomic-status 
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children who had received instruction used 
the perceptible basis less and the attribute 
and nominal bases more than those who 
had not received instruction. Also, it was 
shown that there was no overlap in the 
mean scores of the low- and the high-socio- 
economic-status children on a test of geo- 
metric concepts. Here the fifth-grade high- 
socioeconomic-status group had a higher 
mean score (33.3) than the eighth-grade low- 
socioeconomic-status (28.8) group, which 
was the highest mean score of the three low- 
socioeconomic-status grade groups. 

It appears that a combination of low so- 
cioeconomic status and lack of instruction 
is associated with immature conceptualizing 
behaviors and low achievement. Apparently 
low-socioeconomic-status children who do 
not receive instruction remain at an imma- 
ture level of conceptualizing, using the per- 
ceptible properties of the concept examples 
rather than their defining attributes and 
names. Concept attainment for these chil- 
dren remains at a relatively low level. 

These conclusions are of substantial theo- 
retical interest and practical significance. 
On the theoretical side, it is noted that the 
orderly change in the bases of classifying 
found by Olver (1961), Rigney (1962), and 
Wiviott (1970) did not occur in the low- 
socioeconomic-status children, particularly 
between Grades 8 and 11. As Bruner et al. 
(1966) conjectured, it appears that instruc- 
tion is in fact related not only to the partic- 
ular concepts that individuals attain 
but also to their means of conceptualizing. 
Other investigators have also reported con- 
siderable differences between socioeconomic 
groups in classificatory behaviors (Findlay 
& McGuire, 1957; Raven, 1967; Siller, 
1957). Some have also noted that educa- 
tional experience tends to mitigate these 
differences (Evans & Segall, 1969; Schmidt 
& Nzimande, 1970). 

On the practical side, it is probable that 
the eleventh-grade children of low socioeco- 
nomic status in this study will never func- 
tion at a reasonably mature level with re- 
spect to being able to classify geometric 
forms and to use concepts involving geo- 
metric forms. Their classificatory behaviors 
undoubtedly will continue to be typical of 
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younger elementary school age children. 
This appears unfortunate in view of the 
fact that the younger low-socioeconomic- 
status subjects who received instruction 
were beginning to operate at the more ma- 
ture levels. 
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Learning-potential assessment is hypothesized to be more sensitive 
than traditional IQ tests in tapping the intellectual potential of dis- 
advantaged children. The Series Learning Potential Test was admin- 
istered three times to bright-normal, dull-to-average and subnormal 
children with training in problem-relevant strategies interpolated fol- 
lowing the second administration. Both low-IQ groups gained more 
than the high-IQ group from the training, and the dull group gained 
more than the other groups from repeated administrations without 
training. IQ was found to predict teacher ratings of school achieve- 
ment for all groups and the Series Learning Potential Test was found 
to predict teacher ratings for the bright group; however, the Series 
Learning Potential Test was superior to IQ as a predictor in the dull- 
to-average and the subnormal groups. Substantial proportions of sub- 
normal subjects reached the average reasoning level of their non- 
retarded peers following the short training session. 


Children from poor and/or nonwhite 
homes tend to score at below-average levels 
on tests which purport to measure intelli- 
gence. Jensen (1969) has argued that the 
mean difference of 15 IQ points between 
white and black groups represents à real 
difference in inborn general ability. How- 
ever, the IQ difference has also been fre- 
quently explained by the handicaps that 
poor and/or nonwhite children bring to the 
testing situation. They are fearful of the 
testing process, expect to do poorly, are 
often insensitive to speed requirements, are 
unfamiliar with the problem contents, and 
do not develop spontaneously the most ef- 
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fective strategies (by middle-class criteria) 
to solve the problems. 

IQ tests measure the degree to which 
children have spontaneously acquired from 
their natural environment the skills and 
knowledge which cumulatively predict aca- 
demic school success. The plausible assump- 
tion is made that a child who learned infor- 
mally prior to entering school will continue 
to learn—formally and informally—in and 
out of school. Children from non-middle- 
class homes who do not have an equal and 
frequent access to school-preparatory expe- 
riences tend to score poorly on IQ tests and 
are often viewed as “less intelligent.” 

Yet, many of these same low-IQ children 
are competent problem-solvers in their non- 
school environment, having mastered the 
skills, knowledge, and strategies necessary 
to maintain a successful adjustment. These 
children do learn and profit from relevant 
experiences more successfully than their IQ 
scores and school achievements indicate. In 
Hunt's (1961) terms, this difference in com- 
petence may represent the problem of the 
match between the environmental demands 
of the school and its tests, and the child's 
existing schemata in his familiar world. As- 
sessment procedures must be developed 
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which will optimize the match and result in 
more culture-fair measurement of general 
ability. 

Eells, Davis, Havighurst, Herrick, and 
Tyler (1951) developed problem games that 
were deemed relevant to the experiences of 
low-socioeconomic-status children, but 
found no change in the gap between the 
different social class groups. This may have 
been due to the demand that the child go 
about solving the problems as middle-class 
psychologists thought he should. Changing 
the contents of the problems was not suffi- 
cient to enable the lower-class children to 
narrow the gap between their problem-solv- 
ing styles and those expected by the mid- 
dle-class bias of the test constructors. 

Budoff (1969) has described a learning- 
potential procedure for assessing ability 
among low-IQ, low-socioeconomic-status 
children based on a process-oriented con- 
ceptualization of intelligence. The focus of 
the learning-potential paradigm is on the 
child's trainability, that is, his ability to 
improve performance on reasoning prob- 
lems following a systematic learning experi- 
ence. Reasoning is viewed as the critical 
ability. The reasoning tasks are adminis- 
tered in a  "test-train-test" sequence. 
Training allows the low-socioeconomic-sta- 
tus child to understand how to solve the 
problems when the contents of the problems 
may be strange and the appropriate strate- 
gies are not readily apparent to him. 

Training is hypothesized to be particu- 
larly critical for the child from a poor and/ 
or nonwhite background, who may learn 
different cognitive strategies in different ex- 
pressive formats than those presumed to be 
available by the tests. The training helps 
the child to narrow the cognitive gap be- 
tween his previously learned problem-solv- 
ing strategies and those implicit to the 
problems he must ordinarily solve on the 
middle-class-bias tests he encounters, The 
test-train-test paradigm also minimizes the 
artificiality of the test situation. The re- 
peated contacts with the materials in a con- 
text of support and teaching allow the 
school-failing child to develop a sense that 
he can be competent. Without this compe- 
tence boost, he tends not to perform at his 
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best, implicitly expecting failure (Zigl 
1966). The essence of this assessment stri 
egy, then, is to impose some control on + 
potentially negative effects on his test p 
formance of prior life experiences. 

The pretest reflects the subject’s prese 
level of functioning and his existing a 
to work with the problems. Posttra 
scores reflect the child's ability under ¢ 
timized conditions in which all subjects 
familiar with the task and its dema 
have had success in solving problems 
lar to those on the test, and have had t 
opportunity to learn and apply relevi 
strategies. Budoff and Corman (1974) 
shown that educable mentally retard 
(EMR) pretest scores correlate highly Wi 
Stanford-Binet and Wechsler scores 4 
other indices of psychosocial vulnerabili 
associated with socioeconomic status. 
training scores were correlated only ¥ 
performance scale scores. | 

The problems employed in the learni 
potential tests fall within Jensen's (196 
Level II eategory (conceptual learm 
According to Jensen, there is little over 
between the Level II curves of lower- 8 
higher- socioeconomic-status groups. 4i 
sen's theory is based on the spontaneo 
responses evoked in traditional intellige 
testing. Since these testing procedures 
derestimate the potential Level II abi 
substantial proportions of disadvantal 
children, one would expect to find gre 
overlap between the curves of lower- 
higher- socioeconomie-status groups. 
learning-potential posttests than on lei 
ing-potential pretests and other tests Wi 
tap the students’ spontaneous produc ti 
Complete overlap between the Level 
curves of middle- and low-socioecont 
status children is not, assumed. Inclusio 
training procedures within ability 
may identify the numerous false posti 
among the disadvantaged (Ortar, 1960): 

To date, nonverbal reasoning tasks 4 
been used since these are not depende n 
language, which represents one major 
of these children's difficulties. Three 1 
verbal tasks have been employed in 8 
ment of learning potential: an altered 
sion of the Kohs Block Designs (BU 
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1969), Raven's Progressive Matrices (Bu- 
doff, Corman, & Litzinger, 1973), and the 
Series Learning Potential Test (Babad & 
Budoff, 1971). The Series Learning Poten- 
tial Test, used in the present study, is 
a group test for the primary and elemen- 
tary grades. Its major task is the com- 
pletion of series of pictures or geometric 
forms, arranged in a pattern in which the 
figures change systematically. Each item 
presents a horizontal row of cells each of 
which contains a stimulus figure. One cell is 
left blank. The subjects must identify 
among the multiple choices on the right, the 
picture which best completes the series. 
Four concepts may vary in a series: seman- 
tie content (meaningful or geometrie fig- 
ures), size (large/small), color (black/ 
white), or orientation (up/down, or left/ 
right). The concepts may vary symmetri- 
cally or asymmetrically and the blank 
space may be placed in any part of the 
series. The hardest items are those in which 
the blank space appears early in the series, 
and the concepts change asymmetrically. 

Two, equivalent, 65-item forms of the 
test are used, with items corresponding in 
concepts but with different pictures in simi- 
lar arrangements. None of the 17 coaching 
items is identical to any of the 180 test 
items. The test contains several types of 
items. Forty items consist of meaningful 
pictures ranging from easy items (2 con- 
cepts are arranged symmetrically and the 
blank space appears at the end of the se- 
ries) to more difficult ones (asymmetric al- 
ternation of 3 concepts, blank space in ini- 
tial position). Improvement in solving the 
Meaningful picture series on the posttest in- 
dicates the child can solve nontrained in- 
stances of the trained items. Ten items with 
geometric figures discern whether the sub- 
ject can transfer the strategies learned in 
training with meaningful pictures to dissim- 
ilar stimuli, Fifteen double-classification 
items (5 matrices in meaningful pictures, 10 
in geometric symbols) presented in a matrix 
format tests for generalization of the 
learned strategies to problems that require 
the same reasoning process in a different 
arrangement. The strategies may also be 
Mediated differently. 


The general hypothesis of the two studies 
reported is that training increases the sensi- 
tivity of the general ability test with low- 
IQ, low-achieving students. Thus, bright, 
middle-class subjects of the same chrono- 
logical age should not demonstrate marked 
gains following practice or training since 
they perform at optimal levels on the pre- 
test. They either know the solutions or 
know how to develop the solutions to the 
reasoning problems prior to training. These 
children's pretraining scores should there- 
fore approximate their posttraining scores, 
and the dispersion of scores should decrease 
since training should help the laggards per- 
form better on the task. Poor, low-IQ stu- 
dents, who suffer from the middle-class bias 
of the IQ test, should do poorly on the trial 
prior to training. Training should increase 
their mean scores and the dispersion, identi- 
fying those who are potentially able but 
who have failed because of the differences 
between their native and the school envi- 
ronments. 

Second, despite the large differences in 1Q 
between the groups, it was hypothesized 
that some proportion of the low-IQ children 
would attain the pretest level of their mid- 
dle-class peers following training. Training 
is hypothesized to facilitate “induced ac- 
quisition” of the problem-relevant strategy, 
which compensates for the middle-class 
children’s spontaneous acquisition. Support 
for this hypothesis would also indicate that 
some low-IQ children are not inferior to 
their middle-class peers in potential ability 
and that some IQ-defined EMRs are educa- 
tionally rather than mentally retarded. 

Both studies include students from three 
1Q groups: (a) bright, normal middle-class 
children (IQs above 100); (b) blue-collar, 
dull-to-average children (1Q, 80 to 99); 
and (c) poor, subnormal, educable mentally 
retarded children (IQs below 80). All stu- 
dents were given the Series Learning Poten- 
tial Test three times in a “test-test-coach- 
test” sequence. In this design, the practice 
effect (Tz — Tı), the training effect (Ts — 
T5), and the combined effect (Ta — T1) can 
be separated and examined independently. 

Five specific hypotheses were tested: 
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1. Low-IQ children gain from repeated, 
uncoached exposures to the Series Learning 
Potential Test more than high-IQ children 
(practice effect, to be tested in Study 1). 

2. Low-IQ children gain from training on 
problem-relevant strategies more than high- 
IQ children (training effect, to be tested in 
Study 1). 

3. Posttraining scores predict school 
achievement of low-IQ groups better than 
group IQ scores do. This is not true for the 
high-IQ group (to be tested in Study 2). 

4, Posttraining scores of the low-IQ 
group predict their school achievement bet- 
ter than pretraining scores. For the high-IQ 
group, however, pretraining scores predict 
school achievement as well as posttraining 
scores. 

5. Following training, substantial propor- 
tions of IQ-defined EMR subjects will at- 
tain the mean pretest score of their nonre- 
tarded peers (to be tested in both studies). 


Srupy 1 


The Series Learning Potential Test was 
administered three times to groups of 
bright, dull-to-average, and  subnormal 
(EMR) children in the middle, elementary 
school years. A group training session was 
interpolated following the second test ad- 
ministration. Three scores were calculated 
for each subject: practice score (Te — T), 
training score (T4 — T»), and a combined 
gain score (Ts — Tı). 


Method 


Subjects. Subjects were 126 white children (58 
boys, 68 girls) in the third, fourth, and fifth grades 
of several New England schools. The subjects were 
divided into three groups according to their IQ 
Ag on the Test of General Ability (Flanagan, 
1 T 


bright normal [n — 64, 21 boys and 43 girls; M 
IQ = 113 (+12); IQ 2100; predominantly from 
middle-class, suburban homes] ; 

dull-to-average [n = 37, 17 boys and 20 girls; M 
IQ = 85 (+7) ; 80 € IQ € 99; predominantly from 
blue-collar homes in an inner-city district]; and 
subnormal (EMR), drawn from special classes for 
the mentally retarded [n = 25, 20 boys and 5 girls; 
M IQ = 68 (+7); IQ S 80; from blue-collar homes 
in an inner-city district]. 


There was no indication in the school records of 
organic brain pathology in any subject. 
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Instruments. In this study, the Test of General 
Ability (Flanagan, 1960) was used to divide 
subjects into three IQ groups, while the three li 
ing-potential scores (Babad & Budoff, 1971) se 
as the dependent variables. j 

The Test of General Ability is a group IQ tesh | 
consisting of verbal and reasoning parts. Both part 
are multiple-choice tests, with pictorial stimuli 
the verbal part and abstract symbols in the reasons 
ing part. Three scores are derived from the 
of General Ability: Verbal IQ, Reasoning IQ, 
Total IQ. 

The Series Learning Potential Test, as describe 
above, consists of two, equivalent, 65-item multipl 
choice tests (alternate form reliability is .84). Form 
A is used for the pretest and Form B for the post 
test. Each form is administered as a power test 
usually takes 30-35 minutes to complete. 
training booklet consists of 17 items, all of wl 
are solved together by the entire class. 
strategies are taught and practiced during trainin 


1. The students learn to identify each concept 
that changes. 
2. They learn to “sing the tune” for each con 
cept, one at a time as an organizational or chunk 
ing aid to identify the pattern of each concept. 
3. To reduce the memory load in a multiconcgl 
item, the student crosses out the wrong choices fi 
each concept. 
4. The child learns to reverse the direction 
which the tune is sung when the location of th 
blank space calls for such reversal, for exampl 
blank at the beginning of a series. 
5. The child learns to identify the starting poll 
of a concept when the series starts in the middle! 
the pattern. 


All 17 problems are successfully solved by # 
end of the 30-45 minute training session. 

Procedure. The study was conducted during 10 
sessions in the spring of the school year. The Se 
Learning Potential Test was first administered? 
three sessions, with two-day intervals between? 
sions. All subjects received Form A (pretest) im) 
first session. In the second session, half of the sul 
jects received Form A again, and the other! 
Form B. For both groups, the training Se 
followed immediately. In the third session, 
jects received Form B (posttest). The Test 
General Ability was administered several d 
later. All tests were administered by experien 
examiners, following standard instructions: In esi 
class, all tests were administered by the same P 
son. 


Results 


Practice, training, and combined 8 
scores were calculated for the three E09 
The patterns of these scores are presei 
in Figure 1. The means and standard dey 
tions of the three groups in initial anc ^» 
testing are presented in Table 1. 
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Fiaure 1. Practice gain, coaching gain, and combined gain of the three groups. 


To test the first and second hypotheses, 
Separate one-way analyses of variance were 
computed on practice, training, and com- 
bined gain effects for the three IQ groups. 
These analyses were followed by t tests be- 
tween the three possible pairs of groups. As 
can be seen in Figure 1 and Table 1, the 
first and second hypotheses were supported. 
Repeated administrations without training 
resulted in differential score increments for 
the three groups (F — 16.3, df — 2/125, p 
< .001). The dull-to-average group gained 
from practice significantly more than did 
the subnormal (t = 2.54, df = 60, p < .01) 
and the bright-normal groups (t = 3.33, df 
= 99, p < .001), while the difference in 


practice gains between the latter groups 
was nonsignificant. Training produced dif- 
ferential increments in score (F = 2.4, df = 
2/125, p < .10), but the pattern differed 
from that found for practice. Both the sub- 
normal and the dull-to-average groups 
gained more from training than did the 
bright-normal group (¢ = 2.69, df = 87,» 
« .005; t = 2.26, df = 99, p « .025, respec- 
tively). The highest mean gain in training 
was attained by the subnormal group, but 
the difference in the training effect between 
the two low-IQ groups was not significant. 

The analysis of variance of the combined 
gain scores indicated differential increments 
in score (F = 9.32, df = 2/125, p < 001). 


TABLE 1 
Mrans anp Sranparp Deviations oF SLPT ann TOGA Scores OF THE Turer Groups IN Srupy 1 
SLPT 2, 
fce ET final performance TOGA IO 
Group 
n" x SD RT SD x SD 
Bright-normal 64 51.9 5.0 55.4 47 113 12.0 
Dull-to-average 37 37.6 10.1 47.3 9.9 85 7.0 
Subnormal 25 26.4 10.7 35.0 12.3 68 7.0 


Note. Abbreviations: SLPT = Series Learning Potential Test and TOGA = Test of General Ability. 
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Both low-IQ groups gained more than did 
the bright-normal group (t = 4.08, df — 99, 
p < 001; t = 2.89, df = 87, p < .005, for 
the bright versus dull and subnormal 
groups, respectively), while the difference 
between the low-IQ groups was not signifi- 
cant. 

Since half the subjects received Form A 
or B of the Series Learning Potential Test 
in the second session, the analyses of vari- 
ance were recomputed separately for sub- 
jects who took each form. All effects and 
patterns were similar, and the means of the 
groups were almost identical. 

Repeated administrations of the Series 
Learning Potential Test—with and without 
training—thus resulted in differential ef- 
fects on the three groups. The disadvan- 
taged, low-IQ children learned and profited 
from training on problem-relevant strate- 
gies more than did their middle-class, high- 
IQ peers. The dull-to-average group gained 
from practice or training, while the subnor- 
mal sample seemed to profit most from 
training. 

The relative spread of Test of General 
Ability and Series Learning Potential Test 
scores for the three groups is also indicative 
of the sensitivity of learning-potential as- 
sessment in the low-socioeconomic-status, 
low-IQ groups. While the standard devia- 
tion of IQ scores for the bright-normal 
group (12) is almost twice as large as the 
standard deviations for both low-IQ groups 

(both 7 points), the picture is reversed for 
Series Learning Potential Test scores. The 
standard deviations of both low-IQ groups 
are twice as large as those of the bright- 
normal group, both in initial and final test- 
ing (Table 1). The differential effect of 
training is indicated in the changes of the 

; Series Learning Potential Test standard de- 
viations from initial to final testing. A 
slight shrinkage in standard deviation was 
found for the bright group as compared 
with the 20% increase in the standard de- 
viation of the subnormal group. 

The possibility that the observed pattern 
of results was caused by a ceiling effect for 
the bright-normal group in the posttest was 
explored by checking this group’s perform- 
ance in the posttest. The mean score (55.4) 
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was more than two standard deviations b 
low the ceiling (65) of the Series Lea 
Potential Test. The distribution of s 
approximated the normal curve, with no 
dication of that skewedness which typie 
characterizes ceiling effects. 


Srupy 2 


The validity of Series Learning Po 
Test scores and Test of General Ability 
in predicting teacher-rated school sue 
was compared for bright-normal, dull 
average, and subnormal (EMR) sam 
The Series Learning Potential Test 
predicted to be superior in validity h 
Test of General Ability in the low-IQ bi 
not in the high-IQ range. The validity 
the Series Learning Potential Test was 
pected to improve from pretest to p 
in the low-IQ but not the high-IQ range: 


Method 


Subjects. The subjects of the first study W 
selected from separate schools, the EMR sam 
consisting of students in segregated special ¢ 
for the retarded. To equate samples, sch 
classes, and teacher grades, a new sample 
selected with a wide representative range 0} 
in each class. In this sample, all EMR 
were integrated in regular classes. 

Subjects were 207 white children in 121 
grade classes of three elementary school 
schools were located in a small New England 
with a predominantly white, working-class 
lation. The subjects were divided into three 
according to their Test of General Abi 
scores: bright-normal [n = 76, M = 11 
IQ < 100); dull-to-average [n = 95, M = 
80 € IQ < 99]; and subnormal [n = 36, M^ 
(56.5), IQ € 80]. There was no indication 
school records of organic brain pathology Y 
subject. Subjects for all three groups wel 
in each of the 12 classes. 1 

Instruments. As in the first study, th 
Learning Potential Test and the Test of 
Ability were used. School achievement Y 
sured by teacher ratings for reading, 
arithmetic, “general achievement," and “ 
potential.” The ratings were done on an t 
scale, corresponding to the letter grade § 
which the teachers are accustomed to used 
school system. 

Procedure. The study was condueted 
sessions during the spring of the seh 
the first session, the Series Learning Poten 
pretest (Form A) was administered to all: 
followed by the standardized training. 198 
completed the rating scales during this 
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Series Learning Potential Test posttest was ad- 
ministered in the second session, three days after 
the first. The Test of General Ability was ad- 
ministered several days later by the same tester. 


Results 


The relationships of learning-potential 
and IQ variables to teacher ratings of 
school achievement are presented in Table 2 
for the entire sample—the subnormal, dull- 
to-average, and bright-normal groups, re- 
spectively. The correlations of learning po- 
tential and IQ with achievement ratings 
were of the same magnitude in the total 
sample. As hypothesized, the learning-po- 
tential predictions of achievement were 
higher than the IQ predictions for the sub- 
normal and dull-to-average samples. For 
the bright-normal group, there was almost 
no difference in predictive power. In fact, 
the posttest, learning-potential correlations 
for the two low-IQ groups were more than 
twice as large as those of IQ scores for these 
groups. Because of the restricted range, all 
subsample correlations decreased; but the 
predictability of IQ scores suffered more 
from this restriction than did the learning- 
potential scores for the low-IQ groups. The 
third hypothesis was thus supported. The 
validity of the Series Learning Potential 
Test was not inferior to that of the Test of 
General Ability IQ in the overall sample 
and was superior to IQ in the restricted 
low-IQ range. 

As to the fourth hypothesis, the expected 
changes in validity of the Series Learning 
Potential Test from pretest to posttest were 
found (increase for the low-IQ groups; no 
Increase for the high-IQ group), but the 
Changes were small and inconclusive. The 
hypothesis was thus not confirmed by the 
data. One should also note the relative su- 
Periority of Reasoning-IQ predictions over 
Verbal-IQ predictions for both low-IQ 
groups and the absence of such differences 
in the bright-normal group. 

One may view another strength of the 
training-based learning-potential measure- 
ment by examining the proportion of IQ- 
defined EMR subjects who attain the mean 
Pretest score of their nonretarded agemates 
following training, According to the traina- 


_ bility hypothesis, EMR subjects who reach 


the pretraining, reasoning level of nonretar- 
dates following a short training session may 
have been falsely identified as mentally re- 
tarded. Proportions of EMR subjects fall- 
ing above the mean pretest score of nonre- 
tarded groups (dull-to-average and bright- 
normal) were caleulated for the samples in 
Study 1 and Study 2. 

The changes from pre- to posttraining 
proportions are quite marked, indicating 
that following a short problem-relevant 
learning experience, substantial proportions 
of IQ-defined EMR subjects do attain the 
average (noncoached) reasoning level of 
their nonretarded peers. The fifth hypothe- 
sis was thus supported. Among special-class 
EMRs, 36% and 1396 attained scores above 
the pretraining means of the dull-average 
and bright-normal groups, respectively. 
The mean EMR posttest performance ex- 
ceeded the mean pretest level of the dull-to- 
average group, despite the 16-point IQ dif- 
ference between the groups, and 61% of the 
EMBs scored above this level. More than a 
third of the EMR subjects (3695) surpassed 
the mean pretest score of the bright group, 
despite a 41-point (!) difference between 
the average IQs of the two groups. 


Discussion 

The results indicate that learning-poten- 
tial testing did show a higher level of dif- 
ferentiation among low-IQ, low-socioeco- 
nomie-status subjects than the standard 
test of intelligence. Utilization of a learn- 
ing-potential assessment paradigm con- 
firmed considerable ability to reason among 
low-IQ children which was not indicated by 
standard measures of intelligence. Disad- 
vantaged dull-to-average and EMR groups 
profited from training on problem-relevant 
strategies more than did the bright-normal 
group, while the dull-to-average group 
gained from practice or training appropri- 
ate to the task. Substantial proportions of 
1Q-defined EMR children reached the rea- 
soning level of their brighter peers following 
a short training session on problem-relevant 
strategies. The observed sensitivity of 
learning-potential measurement in the low- 
IQ range was further borne out by the su- 
periority of the Series Learning Potential 


>m 
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Test to the Test of General Ability IQ in 
predieting school achievement. While the 
predictive power of both tests was of the 
same magnitude in the bright-normal 
group, the posttraining score was a better 
predictor than IQ in the dull-to-average 
and EMR groups. 

To permit reasonable predictions con- 
cerning educability in a school setting, rea- 
soning abilities of the child must be tested. 
These abilities are inevitably tainted by 
differences in the prior cultural experiences 
of the subjects. By providing all children 
with directed training on problem-relevant 
strategies, the learning-potential paradigm 
seeks to bring all children to a more equal- 
ized starting point prior to testing, thereby 
reducing the advantage of the high-socioec- 
onomic-status children. Indeed, there was a 
considerably greater overlap of scores 
among groups differing in IQ following 
training than there had been prior to train- 
ing. Given the opportunity to become fa- 
miliar with the demands of the task and to 
learn how to approach the problems, the 
lower-IQ children from blue-collar and poor 
backgrounds displayed considerably more 
competence, confidence, and sense of chal- 
lenge on the task than they had previously. 
In Hunt’s (1961) terms, it could be said 
that for the posttraining measurement, 
there was a better match between the chil- 
dren's existing schemata and the demands 
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of the situation. For the middle-class chil- 
dren, the “problem of the match” did not 
exist, and they performed as well on the 
pretest as on the posttest. 

The increased overlap between reasoning 
curves of high- and low-socioeconomic-sta- 
tus subjects following coaching indicates 
that the Level II ability of portions of dis- 
advantaged populations is greater than that 
posited by Jensen on the basis of one-shot 
reasoning tests. Training permitted identifi- 
cation of those poor, low-IQ children who 
have potential for functioning within a 
framework of Level II thinking. 

Budoff, Meskin, and Harrison (1971) 
have demonstrated how to apply this evi- 
dence of reasoning ability within a class 
room curriculum. They tested the acquis 
tion of principles of electricity in a manipu- 
lative science program with groups of spe 
cial and regular class students. They found 
that learning-potential status best differen- 
tiated levels of attainment following this 
course, while neither IQ nor special versus 
regular class placement could distinguish 
levels of achievement after the course. 

Judging by the success of the learning- 
potential approach, all successful teaching 
programs would seem to profit by incorpo- 
rating the following: the initial utilization 
of channels in which children are less defi- 
cient, the provision of the necessary prepa 
ratory learning experiences, and the crea 


TABLE 2 
Corrricients or CORRELATION or SLPT anp TOGA Scores WITH TracuER RATINGS 
Predictor 
— sample 
ERU Seize itni Dull-to-average sample Bright norm 
Teacher ratings 
SLPT | TOGA | spr TOGA arr | toca | ster | 7038 
IL 
aaae alal alal lalala alila 
THEILSMHHERITBHEHRBHISBHHHSC 
BNENNHUHEB :HHEHESESEHEBBESEILILIES zc 
Reading .50|.50.45| .49).641.24) .28|—.01| .20.12 09.35.39, -271:2 
x .49).54).24) .28|—.01| .20|.12/.35| .35|.05| .15] 09] .35] 33) - 
Spelling’ .45|.49|.45| .47|.88].26| .38| .19| .20|.20).23| .26).08) .17|.17 .32).32 EE 
Arithmetic 249.5340, -60) 62{-a5| -49| -18| -39.18.27| .29.14| 16.18) 43].40| Dy 
General achievement |.51).54/.49| .52|.56).33) 40) .07| .25).15).31) .34) 08) .22|.19,.40|.38 .33 3 
Academic potential |.49|.50/.43| .47|.50.27| .22| .11| .01|.13|.35| .44|.04| .23|.11|.97 .35 EE 
Average correlation|.49|.51|.46| .49].53].29| .35| .11| .20|.10,.30| .34|.08| .19|.15|.37 .85| 20^ 
ilt 


Note. Abbreviations: SLPT = Series Learning Potential Test and TOGA = Test of General À 
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tion of subjective feelings of mastery and 
success. 
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INSTRUCTIONAL OBJECTIVES AS DIRECTIONS TO LEARNERS: 


EFFECT OF PASSAGE LENGTH AND AMOUNT OF 
OBJECTIVE-RELEVANT CONTENT! 
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Bell Laboratories, New Brunswick, New Jersey 


ERNST Z. ROTHKOPF 
Bell Laboratories, Murray Hill, New Jersey 


The effects of number of objective-relevant sentences on learning from 
texts ranging in length from about 800 to 3,000 words were investigated 
in two related experiments. The main findings were (a) the use of ob- 
jectives as directions resulted in increased performance; (b) the likeli- 
hood of mastery of any objective (intentional learning) decreased 
with the number of objective-relevant sentences but was not related 
to passage length; (c) incidental learning decreased with passage 
length; (d) study time increased with the number of objectives and 
with passage length; and (e) specifically stated objectives resulted in 
greater intentional learning than generally stated objectives. 


The use of statements of explicit goals as 
directions to learn from text materials has 
been investigated in a study by Rothkopf 
and Kaplan (1972). They found that ex- 
plicit objectives resulted in better learning 
performance than did directions to learn as 
much as possible, both on test items that 
were relevant to the objectives and on inci- 
dental test items. Specifically described ob- 
jectives were found to produce higher per- 
formance than more generally stated objec- 
tives, Further, the likelihood of mastering 
any given objective was found io decline 
with increases in the density of objective- 
relevant sentences in the text. 

The present experiments had three main 
purposes. First, an attempt was made to 
replicate the previous experiment with an 
extended range of text lengths and densities 
of objective-relevant text components. The 
second purpose was to clarify the effective 
elements of the density variable. This fac- 
tor, which was defined in the previous study 


*We are indebted to E. M. Burgin and D. M. 
Lynch for their valuable assistance in helping 
throughout all phases of the experiments. 

* Requests for reprints should be sent to Robert 
Kaplan, Bell Laboratories, CP1 L101, P.O. Box 
2020, New Brunswick, New Jersey 08903. 


as the ratio of sentences judged to be objet: 
tive relevant to the total number of sel 
tences in the text, actually involves several 
distinguishable but correlated variables. 
These include (a) the number of relevan 
sentences in the text, (b) the ratio of rele 
vant sentences to the total number of se 
tences in the text, and (c) the number o 
sentences in the direction specifying thes 
objectives to the subject. The purpose 0 the 
two studies reported here was to determin 
whether Variables a and b influenced the 
the likelihood of mastering any given objet 
tive. Another important goal was to d 
mine whether the specificity and number? 
objectives affected the amount of time us 

for studying the text. 


EXPERIMENT 1 


Method 


Materials 


Passages. Three experimental passag " 
a previous study (Rothkopf & Kaplan, 1972) Be 
adapted for use in this experiment. The a ‘| 
were selected from three textbooks Prepon 
the Systems Training Department, eee ip" 
p! ig 


formation about printing specifications. for a b 
ing forms (Passage 1) and an introduction 
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ness information systems (Passage 2) and total 
system development (Passage 3)? The passages 
comprised 842, 1,091, and 1,120 words and 60, 54, 
and 55 sentences, respectively. 

Passage length. The three basic passages were 
used singly and in combinations to form eight ex- 
perimental texts, 56, 113, and 169 sentences in 
length. Length 56 was achieved by using separately 
each of the three basic passages (X length = 56 
sentences). The second passage, Length 113, was 
achieved by combining two Length 56 passages. 
Three such two-passage combinations were used: 
Passages 2, 1; 2, 3; and 3, 1 (X length — 113 sen- 
tences). Two versions of the longest, experimental 
passage, Length 169, were constructed by combin- 
ing the three basic passages in order 2, 3, 1 and 2, 
1,3 (X length — 169 sentences). 

Objectives. The objectives were the same as 
those used by Rothkopf and Kaplan (1972). An 
objective was defined as a direction to the subject 
to learn a particular instructional point in the 
passage. Two types of objectives were prepared. 
First, a set of specifically phrased objectives (rele- 
vant to one text sentence) was written for each 
Length 56 passage. Seconda matching set of gen- 
erally phrased objectives (relevant to clusters of 
2-5 adjacent text sentences) was prepared for each 
Length 56 passage. The relevance of passage sen- 
tences to objectives was empirically determined in 
a preliminary study (for details see Rothkopf & 
Kaplan, 1972). The match between 1 general ob- 
jective and 2-5 specific objectives was based on 
their sharing the exact same relevant sentences in 
the passage. 

Four sets of specific and general objectives were 
selected such that their relevant sentences in the 
passage resulted in four densities (40%, 60%, 75%, 
and 85%) for all passage lengths. This designation 
of the four density levels refers to the approximate 
Proportion of objective-relevant sentences in the 
passage. All sets of objectives were about evenly 
distributed throughout the passage. Increases in 
density for a given passage length were produced 


*We are grateful to F. L. Stevenson, Head, 
Systems Training Department, Bell Telephone 
Laboratories, New Brunswick, New Jersey, for per- 
mitting us to use the experimental material. 

. “An example of a cluster of three specific ob- 
Jectives was, “Learn about the physical appearance 
of Gothic type. Learn about the physical appear- 
ance of Italic type. Learn about the physical ap- 
Pearance of Roman type.” The matching general 
Objective for this cluster of specific sentences was, 
Learn about the physical appearance of the three 
kinds of type discussed.” Both the general and the 
Specific objectives were relevant to the following 
three adjacent text sentences: “Gothic type has 
no serifs or doodads to confuse the reader. Con- 
versely, Italie type does have serifs and a distinc- 
tive slant causes it to stand out. The physical 
Appearance of Roman type is similar to Italic in 
that it has serifs, but Roman type does not slant.” 
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by nesting; that is, higher densities were produced 
by adding to the objectives used in the lower 
densities. The mean number of relevant sentences 
for the four densities of Length 56 was 22, 34, 41, 
and 48. It should be noted that only the first group 
of 22 sentences, which comprised the 40% density, 
were common to or reproduced across all densities 
(common intentional). No objectives were written 
for those sentences that remained after the sen- 
tences of 85% density were selected. These remain- 
ing sentences provided a measure of incidental 
learning that was common to every density level 
(common incidental). The density levels for Length 
113 and Length 169 were held constant by combin- 
ing the objectives for the respective Length 56 
passages. The number of objectives and the total 
number of sentences varied across passage length, 
but the proportion of objectives to total sentences 
(density) remained the same. 

Tests. Completion-type test questions were con- 
structed by removing one or two substantive words 
from an approximate paraphrasing of passage sen- 
tences and replacing word(s) with a line of uni- 
form length. A test question was achieved for 
almost every sentence in the basic Length 56 pas- 
sages, This meant that there was a test question 
for every sentence relevant to an objective (in- 
tentional items) and for most of the nonobjective 
sentences (incidental items) for each treatment. 
Combinations of these Length 56 test questions 
were used for testing Length 113 and Length 169, 


Procedure 


Groups of approximately 100 subjects were 
tested in each experimental session. Group proce- 
dures, following those of Rothkopf and Coke 
(1968), were used. Each subject received a package 
of materials consisting of three sequentially num- 
bered manila envelopes that subjects completed in 
the numbered order. The three numbered envelopes 
contained, respectively, (a) a set of objectives and 
a passage, (b) a test on the assigned passage, and 
(c) a nonexperimental task to occupy the subject 
while the other subjects completed their experi- 
mental task. For the reference groups, Envelope 1 
contained a passage and conventional directions to 
learn everything in the passage. The experimental 
subjects were informed that they would be tested 
only on the information in the passage that was 
relevant to the list of objectives. However, they 
were tested on almost every sentence in. the 
passage. This procedure permitted testing of inci- 
dental (no objective) as well as intentional (with 
objectives) learning. The list of objectives was 
available to the subjects while they read the 
passage. The objectives were listed in the same 
sequence that the corresponding relevant sentences 
occurred in the passage. 

Eighteen subjects were assigned to each of 24 
treatments (n = 432). Fifty-four additional sub- 

rised three reference groups, one for each 


jects comp! 
Be length (n = 18 per length), These groups 
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PERCENT CORRECT RESPONSES 


Ficure 1. Mean proportion of correct responses for intentional and in- 
cidental test items that were common to all treatments as a function of pas- 
sage length, density, and specificity of objectives. (Reference [REF] groups 
that received no objectives are shown for each passage length.) 


read the passages with directions to learn every- 
thing in the passage. This treatment was similar 
to conventional directions in learning experiments 
which do not usually present objectives but give 
broad directions to learn everything possible. 


Subjects 


Students from the following seven New Jersey 
high schools served as paid volunteers in the ex- 
periment: Battin High School, Elizabeth; Ber- 
nards High School, Bernardsville; Caldwell High 
School, West Caldwell; Arthur L. Johnson Re- 
gional High School, Clark; Livingston High 
School, Livingston; Somerville High School, 
Somerville; and Westfield High School, West- 
field* They consisted of 177 males and 309 females 
between the ages of 13 and 18 (n = 486). The ex- 
perimental sessions were conducted in the high 
school cafeterias after the last school period. 


- Results 
Intentional Learning 


The objective-relevant test items that 
were common to all treatments were ana- 
lysed with a 2 x 3 X 4 analysis of variance 
containing (a) two levels of objective speci- 
ficity (specific and general) ; (b) three levels 
of passage length (56, 113, and 169 sen- 
tences) ; and (c) four levels of density (40%, 


5 We extend special appreciation to the follow- 
ing high school principals for recruiting student 
participants and permitting the use of their facili- 
ties: D. Parisi, T. Froisland, N. Bussiere, R. 
Hough, L. Hurley, M. Crisci, and A. Bobal, re- 
spective to the listed schools. 
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60%, 75%, 85%).® Arc sine transformations 
were used on the proportion of correct re 
sponses. Treatment means are shown in Fig- 
ure 1. The analysis supports the following 
conclusions. 

The specifically stated objectives (X= 
.37) produced higher intentional learning 
scores than did the generally stated objet: | 
tives (€ = 31; F = 8.61, df = 1/408, p € 
01). As seen in Figure 1, the likelihood of 
mastering any single intentional item gen- 
erally decreased with passage length & 
Lengths 56, 113, and 169 — .36, .36, and 29, 
respectively; F = 4.69, df = 2/408, P < 
01). The likelihood that any given inten- 
tional item was correctly recalled tended to 
decline with increases in density, althou 
the 75% density condition for Passage 
Length 56 appeared to be aberrant in this re- 
spect (X Densities 40%, 60%, 75%, and 85% 
= 37, .34, .35, and .29, respectively; T 
2.89, df = 3/408, p < .05). The density ef- 
fect was somewhat more pronounced for t30 
specific than the general objectives treat- l 
ments. None of the interactions reached 816- | 
nificance. 


Incidental Learning 


: it 

The nonobjective-relevant test items e 
were common to all treatments were E i 
lysed with a 2 x 3 X 3 analysis of variant | 


fot 
* Appreciation is extended to J. R. Gossman | 
his participation in the analysis of data. 


.. 001; Length 113, t = 
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The first two factors were the same as those 
for intentional learning. However, the third 
factor (density) was reduced to three levels 
(40%, 60%, and 75%) in order to increase 
the number of incidental learning test items 
that were common to all densities. That is, 
there were only four items common to all 
conditions which were available to measure 
Length 56 incidental learning when 8596 
density was included in the analysis. The 
elimination of 8595 density permitted 12 
items common to all conditions to be used 
as à measure of incidental learning. Arc sine 
transformations were again used on the pro- 
portion of correct responses. The incidental 
learning means are given in Figure 1. 

Specificity of objectives and density of 
objective-relevant sentences in the passage 
had little or no effect on incidental learning 
(F — 42, df — 1/306 and F — .65, df — 
2/306, respectively). The passage length 
main effect, was significant (X Lengths 56, 
118, and 169 = .27, .28, and .23, respec- 
tively; F = 3.21, df = 2/306, p < .05). The 
prevailing trend was that the likelihood of 
correct response to any given incidental item 
decreased as a function of passage length. 
None of the interactions were significant. As 
expected, performance on incidental items 
(X = .26) was poorer than on intentional 
items (X = .34). 


Reference Group Comparisons 


The reference treatment resulted in 
weaker performance with respect to both in- 
tentional and incidental test items than did 
the experimental treatment. The only excep- 
tion was the incidental learning score for the 
60% density/specifie objectives condition in 
the Length 56 passages. All comparisons be- 
tween reference and treatment groups were 
made with ¢ tests. Figure 1 shows that inten- 
tional learning scores for the treatment 
groups were higher than the reference group 
Scores (Length 56, t = 4.13, df = 160, p < 
4.65, df = 160, p < 
001; Length 169, t = 3.90, df = 160, p < 
001). In addition, t tests were used to com- 
Pare treatment with reference groups for in- 
cidental learning. Length 56 incidental 
learning scores were not significantly differ- 
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ent from the reference group scores (t = 
1.28, df = 160, p < .05). However, signifi- 
cantly higher incidental learning was found 
for the Length 113 treatment group over 
the reference group (t = 2.23, df = 160, p < 
.05) and for the Length 169 treatment group 
over the reference group (t — 1.79, df — 160, 
p < .05). Similar to the incidental learning 
scores of the experimental treatments, the 
reference group scores generally decreased 
as a function of passage length (X Lengths 
56, 113, and 169 = .20, .19, and .16, respec- 
tively). This decrease was not statistically 
reliable. 
Discussion 

The results of Experiment 1 replicated all 
of the major findings reported by Rothkopf 
and Kaplan (1972) with passages up to 
three times as long as those in the original 
study and with a much greater range of 
densities. The new result in Experiment 1 
was that both intentional and incidental 
learning were observed to decrease with 
passage length. It should be noted, however, 
that it is impossible to determine from Ex- 
periment 1 whether this finding with respect 
to intentional learning was due to passage 
length or the greater number of objectives 
in the longer passages. This was due to the 
fact that experimental densities of objective- 
relevant sentences were held constant for 
all three passage lengths. Consequently, the 
average absolute number of objective-rele- 
vant sentences was greater for the longer 
than for the shorter passages. 


EXPERIMENT 2 


The primary purpose of Experiment 2 
was to determine whether the decremental 
effects on intentional learning were due to 
increases in density or the absolute number 
of objective-relevant sentences. This was 
accomplished by observing learning per- 
formance associated with various passage 
lengths when the number of objective-rele- 
vant sentences was the same for each pas- 
sage length but density was permitted to 
vary. Under these experimental arrange- 
ments, it was possible to (a) evaluate the 
effect of passage length with the number of 
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objectives held constant and (b) 'evaluate 
the effect of a given number of objectives 
for various passage lengths. A density effect 
on learning would be indicated if, for a 
given number of objectives, performance 
was observed to be stronger for longer 
passages. On the other hand, decrements 
due to an increasing number of objectives 
but constant levels of performance for any 
given number of objectives across various 
passage lengths would indicate that in- 
creasing the number of objectives was re- 
sponsible for the “density” decrements ob- 
served in Rothkopf and Kaplan (1972) and 
in Experiment 1. Another purpose of Experi- 
ment 2 was to obtain inspection time data 
for each treatment. 


Method 


Design and Procedure 


The passages and basic objectives of Experi- 
ment 1 were used here. The data collected from 
six treatments in Experiment 1 were combined 
with five new experimental groups to produce an 
experimental design that involved three passage 
lengths (56, 113, and 169 sentences), each of which 
was used with objectives referring to 22, 34, 44, 
and 100 relevant sentences (except, of course, the 
impossible combination of Passage Length 56 with 
100 relevant sentences). All combinations of pas- 
sage length and number of relevant sentences 
were used with both specifically and generally 
phrased objectives. This led to an incomplete fac- 
torial design involving three passage lengths, four 
levels of relevant sentences, and two degrees of 
specificity in the phrasing of objectives for a total 
of 22 experimental treatments. Three reference 
groups, one for each passage length and identical 
to those in Experiment 1, were also used in the 
analysis. The procedure for administering the ex- 
perimental materials and the test questions used 
in Experiment 2 were the same as those used in 
Experiment 1. 

Three dependent measures were used: perform- 
ance on (a) intentional and (b) incidental test 
items common to all treatments and (c) passage 
inspection time. The use of Measures (a) and (b) 
was described in Experiment 1. The third measure, 
inspection time, was obtained by requiring the 
‘subject to record the start and stop reading times 
using a large digital clock. The experimental effects 
on these three measures were evaluated with two 
separate analyses because of the missing cell in the 
incomplete factorial design. 


Subjects 


The data from 216 students who participated in 
Experiment 1 were used here. In addition, 180 
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students selected from the same high scho 
and participating simultaneously with those in} 
periment 1 completed the number of subjects 
quired for the treatment groups of this experi 
The same 54 students served as a reference gro 
in both experiments. Altogether, there were í 
subjects consisting of 158 males and 292 femal 
This permitted the assignment of 18 subjects 
each of the 22 treatments and the 3 refere 


groups. 
Results 


Common Intentional and Incidental 
Items 


Two analyses were performed for i 
tentional and incidental items that wi 
common to all treatments. The first wa 
2x 3 X 3 Xx 2 analysis of variance with fi 
two levels of objectives (general and § 
cific) ; (b) three levels of relevant sentent 
(22, 34, and 44 sentences); (c) three ley 
of passage length (56, 113, and 169 & 
tences); and (d) two levels of learning Q 
tentional and incidental) with repeated me 
sures on the last factor. Arc sine transforn 
tions were used for proportion scores. 1 
means for these data are shown in Fig 
and support the conclusions below. 

In general, larger numbers of releva 
sentences resulted in a smaller likelihe 
of successful performance on any given? 
tentional item except for the aberrant 
versal between the 34- and 44-sentence C0 
ditions. The effect due to number of rele 
sentences was significant (F = 4.01, df 
2/306, p < .05). Figure 2 also indicates 
performance on intentional items for A 
given number of relevant sentences ap] 
relatively constant across various pass 
lengths. i 

By holding the number of relevant 8 
tences constant across the various passé 
lengths, the passage length effect ond 
tentional learning was removed (X Leng 
56, 113, and 169 — .38, .39, and .38, re 
tively). Paired comparisons among the 
conditions were made with Newman-K 
tests. The largest difference between Lent 
56 and Length 113 yielded a q of 1.29 (d) 
306, r = 2, p < .05). This suggests that! 
passage length effect on intentional 
ing observed in Experiment 1 (see Pig 
1) was due to a correlation between dens 
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and passage length; that is, when passages of 
various lengths were matched with respect 
to density, as in Experiment 1, the number 
of relevant sentences was greater for the 
longer passages. 

The difference between general (X — .29) 
and specific (X = .34) objectives for in- 
tentional and incidental learning was not 
statistically reliable (F = 3.36, df = 1/306, 
05 < p < .10). One interaction that reached 
significance was between objectives and 
learning (F = 14.63, df = 1/306, p < .001). 
This was a consequence of the fact that, for 
intentional learning, specific objectives (X 
= .43) resulted in higher scores than general 
objectives (X = .34; q = 5.00, df = 306, 
r = 2, p < .001), while there were no dif- 
ferences between specific (X = .24) and 
general (X = .24) directions on incidental 
learning (q = .50, df = 306, r = 2). 

Performance on intentional items (X = 
.38) was better than on incidental items 
(X = 24; F = 89.56, df = 1/306, p < .001). 
There appeared to be no systematic relation- 
ship between incidental scores and the num- 
ber of relevant sentences. Performance on 
incidental items did, however, tend to de- 
crease with passage length although not 
significantly (X Lengths 56, 113, and 169 = 
.82, .25, and :16, respectively). 

The Number of Relevant Sentences X 
Passage Length interaction (F — .79, df — 
4/306) was not significant. That is, for a 
given number of relevant sentences, varia- 
tions in passage length yielded similar 
scores. This finding further suggests that the 
absolute number of intentional sentences, 
rather than density, influenced performance 
on intentional items. 

Another analysis of variance (2 X 4 X 
2 x 2) was performed on these data in order 
to permit the inclusion of 100 objectives in 
the analysis of Passage Lengths 113 and 
169. The factors were (a) two levels of ob- 
jectives (general and specific); (b) four 
levels of relevant sentences (22, 34, 44, and 
100 sentences); (c) two levels of passage 
length (113 and 169 sentences); and (d) 
two levels of learning (intentional and in- 
cidental). Figure 2 also includes the means 
for these data. The results were the same 
as those found in the first analysis except 
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Ficure 2. Mean proportion of correct responses 
for intentional and incidental test items that were 
common to all treatments as a function of passage 
length. (Data points for treatments with the same 
number of objective-relevant sentences but dif- 


ferent passage lengths are connected; Sent — 
sentences.) 


that here performance with specific ob- 
jectives (€ = .31) was significantly greater 
than with general objectives (X = .26; F = 
7.03, df = 1/272, p < 01). The number of 
relevant sentences main effect was signifi- 
cant (X 22, 34, 44, and 100 sentences = 338, 
.26, .29, and 27, respectively; Æ = 3.06, 
df = 3/272, p < 001). The Objectives X 
Learning interaction was again significant 
in this analysis (F = 8.79, df = 1/272, p < 


01). 


Inspection Time 


All analyses of inspection time were per- 
formed on log transformations of time data 
in order to reduce the effect of extremely 
long times. The antilogs of the mean log for 
time data for each treatment and for the 
reference groups are presented in Table 1. 
Two analyses were performed on log trans- 
formations of the inspection times. The de- 
signs of these analyses were identical to the 
two test performance analyses previously 
presented in this experiment. 

The first analysis was a 2 X 3 X 3 design 


g Z 


454 ROBERT KAPLAN AND ERNST Z. ROTHKOPF 
TABLE 1 
AwTILOGs OF Mean Loc Inspection Time 
General objectives Specific objectives 
Wer e No. relevant sentences A No. relevant sentences A 
2 3 “4 100 22 EI m 100 
56 12.47 | 13.03 | 14.88 | — 13.40 | 17.14 | 16.98 | 25.88 | — 19.63 
113 22.28 | 20.14 | 16.79 | 26.61 | 21.19 | 23.77 | 29.38 | 25.47 | 25.53 | 25.94 
169 25.23 | 29.58 | 29.24 | 38.02 | 30.20 | 25.53 | 32.66 | 35.32 | 39.63 | 32.89 
A 19.14 | 19.82 | 19.41 | 31.77 | 21.26 | 21.83 | 25.35 | 28.58 | 31.77 | 26.21 


Note. A = antilog of mean log across conditions. 


with (a) two levels of objectives (general 
and specific); (b) three levels of relevant 
sentences (22, 34, and 44 sentences) ; and (c) 
three levels of passage length (56, 113, and 
169 sentences). Figure 3 shows the antilogs 
of mean log inspection time for the number 
of relevant sentences condition as a function 
of passage length collapsed across general 
and specific objectives. The results obtained 
are reported below. 

More time was spent inspecting passages 
with specifie objectives than general ob- 
jectives (F = 25.29, df = 1/270, p < .001). 
Inspection time increased as a function of 
the number of relevant sentences, but this 
effect did not quite reach significance (F = 
2.68, df = 2/270, .08 > p > .07). Reading 
time increased as a function of passage 
length (F = 45.80, df = 2/270, p < .001). 
Newman-Keuls analyses showed Length 
169 > Length 113 > Length 56 (q = 5.95 
and 7.63, df = 220, r = 2, p < 01, respec- 
tively). The only interaction to attain sig- 
nificance was Relevant Sentences x Passage 
Length (F = 2.99, df = 4/270, p < .05). 
This significant interaction was primarily 
a result of the 44-sentence condition that 
did not increase between Lengths 56 and 
113. q 

The second inspection time analysis was 
a2 X 4 x 2 design with (a) two levels of 
objectives; (b) four levels of relevant sen- 
tences (22, 34, 44, and 100 sentences); and 
(c) two levels of passage length (113 and 
169 sentences). The results of this analysis 
follow (see Figure 3). 

Passages with specific objectives again re- 
quired more reading time than did those 


lute number of objective-relevant sen 


with general objectives (F = 7.66, d 
1/240, p < .01). In this analysis, the ni 
ber of relevant sentences main effect 
significant (F = 5.02, df = 3/240, p < 
Newman-Keuls analyses showed that 
relevant sentences required more time 
22, 34, and 44 relevant sentences (q = 
df = 240, r = 4, p < .01; q = 2.74, 
240,r = 3, p < 01; q = 3.95, df = 
r = 2, p « .01, respectively). In ad 
34 relevant sentences required more 
time than 22 relevant sentences (q — 
df = 240, r = 2, p < .01). Again, the 
Length 169 passage required more res 
time than the Length 113 passage (i 
31.89, df = 1/240, p < .001). None of 
interactions reached significance inj 
analysis. : 

Figure 3 indicates that inspection tim 
every treatment exceeded the inspec 
time for the corresponding reference g 
Inspection time increased approxima 
linearly as the experimental passage bee 
longer and the number of relevant sente 
for the treatment groups became more 
merous. 


Discussion 


The present experiments confirmed 
vious observations (Rothkopf & Kap 
1972) that the likelihood of attaining 
given objective was reduced as the nul 
of relevant sentences in the pass 
creased, and provided important evi 
about the source of this effect. Exper 
2 indicates that the observed decline 1n 
formance was due to increases in the 3 
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in the text and was relatively independent 
of passage length. The density of relevant 
sentences in a passage is therefore not an 
appropriate characterization of the experi- 
mental factor that produced decreases in 
the probability of correct performance on 
any given objective as a function of the in- 
crease in the number of relevant sentences. 

This characterization was tentatively used 

in the Rothkopf and Kaplan (1972) study 

in which only one passage length was used. 

Experiment 2, in which both the number of 

relevant sentences and passage length were 

systematically varied in a factorial design, 
showed that performance on any given ob- 
jective-relevant item was relatively con- 
- stant for any given number of relevant sen- 

tences across passages of different lengths. 
It would not be reasonable, however, to ex- 
tend these conclusions beyond the number 
of relevant sentences and passage lengths 
used in the present experiment. It does not 
seem plausible that the likelihood of master- 
ing any given objective would be the same 
in an extremely short and an extremely long 
experimental passage. The passages used in 
the present experiment are, however, within 
the range of lengths customary in school 
reading assignments. 

The relationship between the likelihood 
of correct performance on any given ob- 
jective and passage length, observed in Ex- 
periment 1, can be accounted for in terms of 
the performance constancy associated with 
number of relevant sentences. In Experi- 

_ ment 1, passages of three different lengths 
but with identical density levels were used 
Consequently, the average number of rele- 
vant sentences was greater for the longer 
passages even though the passages of various 
lengths were matched for density. The main 
effect on intentional learning due to passage 
length can therefore be understood in terms 
of the increase in average number of rele- 
vant sentences associated with increases in 
passage length. 

The two experiments replicate several of 
the findings reported by Rothkopf and 
Kaplan (1972) with passages up to about 
1,500 words in length and with sets of ob- 
jectives that encompass up to 100 sen- 
tences in a passage. These findings include 
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Fiaure 3. Antilogs of mean log inspection time 
for treatments involving various numbers of ob- 
jective-relevant sentences as a function of passage 
length (Sent = sentences). 


the following: (a) Providing instructional 
objectives increases the amount of objective- 
relevant material that is learned as well as 
the learning of incidental background ma- 
terial over the performance levels achieved 
by suitable control groups who were di- 
rected to learn as much as possible, and (b) 
specifically stated objectives resulted in 
better performance than more generally 
phrased objectives. 

The finding that directions to learn par- 
ticular content in a text also facilitates the 
acquisition of incidental material is at 
variance with results recently reported by 
Duchastel (1972). He found that subjects 
who were given objectives performed some- 
what more poorly on incidental material 
than a nonobjective control group. The rea- 
son for the difference between the results of 
these experiments is not clear. Directions to 
attain specifie goals may interfere with 
learning of incidental material when the 
number of objective-relevant sentences in 
a passage is relatively small. The density 
of goal-relevant sentences in the Duchastel 


study was only about 1%, whereas in our 
studies the densities ranged from 13% to 


85%. 


456 


The experimental treatments used in this 
study resulted in higher performance on 
the retention test than the appropriate con- 
trol reference groups. It was also observed, 
however, that the control conditions re- 
sulted in shorter inspection time and that in 
general the treatments associated with bet- 
ter learning performance tended to involve 
longer inspection time. Two comments about 
the relationship between learning and in- 
spection time should be made, since this 
relationship appears to have some bearing on 
practical decisions. 

First, the ratio of correct test responses to 
average inspection time in Experiment 2 was 
higher for the treatment groups of Passage 
Length 169 (.52) than the control group 
(.38). The same relationship held for Pas- 
sage Length 113, where the ratio of correct 
responses to inspection time was .66 for the 
ireatment groups and .44 for the control. 
"This would suggest that the treatment is in 
both cases practically preferable over the 
control, since both learning score and the 
average rate of achievement of this learn- 
ing score in time were greater for the treat- 
ments. 

For Length 56, however, the ratio of cor- 
rect intentional learning responses to in- 
spection time is greater for the control (1.21) 
than the treatment (.76). Whether this im- 
plies that the control group is practically 

preferable over the treatment for Length 56 
passages depends on the relationship be- 
tween achievement and inspection time. If 
it is assumed that with increased oppor- 
tunity for study, inspection time produces 
linear increases in test performance until 
mastery is achieved, then the control group 
appears preferable for the 56-sentence pas- 
sage. This is because slope alone, that is, 
the ratio of correct response to inspection 
time, is sufficient to predict when mastery 
will occur. Assumptions of various curvi- 
linear relationships between performance 
and inspection time may support other de- 
cisions. Assumptions of this type would not 
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approximately 750 and 1,500 words 
length, the average number of correct 
responses increased in a negatively acci 


time. If the relationship between perform 
ance and time increases monotonically buti 
negatively accelerated, the treatment with 
higher performance but lower performanee/ 
time ratio may be preferable to the othe 
treatments if all treatments result in th 
same acquisition curves but vary in hoy 
much time is devoted to study when the op: 
portunity occurs. 

The two experiments reported here have 
not explored the effect on learned perform. 
ance of the length of the list of objectives 
given to subjects as compared to the numi 
ber of objective-relevant sentences in the 
passage. However, this is not a serious prat 
tical problem, since these two factors ale 
in practice closely associated. They were 
also correlated in this study. | 

The present results, together with those 
previously reported by Rothkopf and Kap 
lan (1972), strongly suggest that when in 
structional objectives are known, substantial 
gains in instructional effectiveness may Dé 
obtained by communicating these objective 
to students embarking on reading assign 
ments. 
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ANOTHER VIEW OF THE RELATION OF ENVIRONMENT 
TO MENTAL ABILITIES 


CHESTER W. HARRIS' ano DAVID L. McARTHUR 


University of California, Santa Barbara 


Correlations between environment measures and mental ability test 
performance measures presented by Marjoribanks were examined for 
the presence of a factor, or factors, common between the two sets of 
measures. A single latent common factor was found to account, by 
itself, for all but a small portion of the total variance. This finding 


supports the conclusion that the en’ 


vironment measures are not differ- 


entially related to test performance but rather differ only in their cor- 


relation with the single latent factor. 


The purpose of this note is to propose 
another interpretation of the data presented 
by Marjoribanks (1972) relating environ- 
Ment measures (press and status) to mental 
ability test performance. Marjoribanks used 
his data to search for "subenvironments or 
subsets of environmental forces” that are 
related to each of four abilities as measured 
by the SRA Primary Mental Abilities sub- 
tests. He presents correlations between eight 
“environmental forces” (press for achieve- 
ment, press for activeness, etc.) and the 
mental ability test scores, and he also gives 
correlations between “global” or status indi- 
cators of the environment (education of 
each parent, occupation of father, ete.) and 
these same mental ability test scores. He 
found, in general, higher correlations be- 
tween press variables and test scores than 
between status variables and test scores. 
He also found that the spatial and reasoning 
subtests correlated less highly with the 
press variables than did the verbal and num- 
ber subtests. 

We wish to answer a slightly different 
question, using Marjoribanks’ data The 
question is this: How many latent variables 
are required to “account for" or reproduce 


» Requests for reprints should be sent to Chester 
w. Harris, Department of Education, University 
of California, Santa Barbara, California 93106. 
> We are indebted to Kevin Marjoribanks for 

upplying us with corrected correlation matrices 
after we found a slight error in the published data. 


the correlations between environment mea- 
sures (press and status) and mental ability 
test scores? It can be seen in advance that 
the answer to this question may be critical 
to the interpretation of the observed dif- 
ferences in level of correlation. For example, 
it may be possible that the two types of en- 
vironment measures differ in their cor- 
relation with the test scores primarily be- 
cause the two types of measures (press and 
status) differ in the variance of their true 
score components and not because the true 
score components of the two types of en- 
vironment measures are measuring different 
constructs or latent measures. The analysis 
by Marjoribanks does not permit one to an- 
swer this kind of question. 

In order to answer the question of how 
many latent variables are required to ac- 
count for the cross-correlations between en- 
vironment measures and test scores, we used 
Tucker's interbattery factor analysis model 
(Tucker, 1958). Tables 3 and 5 of the origi- 
nal article (Marjoribanks, 1972) are the 
relevant ones for this analysis, but recall 
that some correction of the data presented 
originally is required. One can simply set 
Table 5 below Table 3 to give the complete 
14 x 4 matrix of cross-correlations between 
environment and test scores. (We eliminated 
the “Total” column, since the total score is 
simply a linear composite of the four sub- 
scores; verbal, number, spatial, and reason- 
ing.) This consolidated matrix of cross- 
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TABLE 1 
CORRELATIONS OF VARIABLES WITH INTERBATTERY 
Factor 
Variable Correlation 

Verbal ability 86 
Number ability 86 
Spatial ability .59 
Reasoning ability .66 
Press for achievement 70 
Press for activeness .49 
Press for intellectuality .60 
Press for independence .39 
Press for English .43 
Press for “ethlanguage”’ .31 
Father dominance 15 
Mother dominance 17 
Education of father fi .33 
Education of mother 17 
Occupation of father 43 
Number of children —.29 

wding ratio — .32 
Ordinal position —.24 


correlations then can be analyzed in the 
manner described by Tucker to give the cor- 
relations of each of the variables with one or 
more factors common to the two batteries 
(interbattery factors). 

For this particular set of data, the ques- 
tion of how many interbattery factors are 
required 


ances), and it was found that the first eigen- 
value (5.093) closely approximated the total 


ment and the test variables, This is an im- 
portant conclusion, since it says that within 
the context of their association with the four 
test variables, the two types of environment 
measures differ primarily in the Magnitude 
of their correlation with a single latent vari- 
able but do not measure different aspects of 
what is common i 


torially similar to the “status” variables but 
are better predictors of mental test scores, 


CHESTER W. HARRIS AND DAVID L. McARTHUR 


So far as these measures, used with this 
sample, are concerned, the 14 environment 
measures all tap the same aspect of what is 
common with the test scores, Similarly, we 
conclude that there is no evidence of “sub- 
sets of environmental forces" that relate dif- 
ferentially to the four test score variables, 
In their relation with these environment 
measures, the four test score 
fer only in the magnitude of their correla- 
tion with the single latent variable that is 
common to the two batteries. 

The correlation of each variable in each 
of the two batteries with the single inter- 
battery factor can also be calculated, These 
are given in Table 1, and they indicate the 
differences referred to above. From these 
correlations, it is clear that the single en- 
vironment variable that is correlated most 
highly with this interbattery factor is “press 
for achievement.” Its correlation with each 
of the four test score variables can be re- 
produced approximately by multiplying its 
coefficient in Table 1 with the coefficient for 
each of the four test score variables. Thus, 
Marjoribanks (1972, Table 3) gives the cor- 
relation of press for achievement with Ver- 
bal as .66, and our values give .86 times 
-70 equals .6020 as the approximation (see 
Table 1). 

It should be emphasized that this single 
interbattery factor provides a very good 
reproduction of the cross-correlations be- 
tween environment measures and test scores. 
However, it does not necessarily reproduce 
the "within-battery correlations.? This is 
simply to say that, for example, the environ- 
ment measures have further factors in com- 
mon with each other beyond (and independ- 
ent of) the single factor in common with the 
test score variables. This is quite reasonable, 
Since measures like number of children in 
the family and crowding ratio are measur- 
ing much the same thing and thus tend to 
have a factor in common independent of 
their relationship with test score variables. 
We are stressing that the interbattery analy- 
Sis focuses attention on the factors held in 
common by the two batteries (environment 


"These within-battery correlations were also 
kindly supplied by Kevin Majoribanks. 


: A a 
variables dif- 


D 


and test scores) and must be supplemented 
by further analyses if all the factors char- 
acteristic of both batteries are desired. Our 
interest here was in this narrow focus, since 
it tends to support conclusions that differ 
somewhat from those given in the original 


study. 
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39/4 Vat on Na nal Pyelalogy 
ANOTHER VIEW OF THE RELATION OF ENVIRONMENT 
MENTAL ABILITIES: 


A REPLY 


KEVIN MARJORIBANKS: 
University of Ozford, England 


Harris and McArthur (1974) concluded that Marjoribanks' (1972) 
measure of the learning environment of the home was not differentially 
related to mental ability test performance, It is argued that the original 
study did not propose that the two types of environmental measures 
were assessing different constructs and that Harris and McArthur's 
interbattery factor analysis of Marjoribanks’ data actually supports 


environmental measures, provides a better explanation of the data 
than Harris and MeArthur's analysis. 


Harris and MeArthur have added little to torially similar to the social status 
our understanding of the relation between ables and are more highly correlated 
environments and mental abilities by their the interbattery factor, the new ana 
extra analysis of the data which appeared provides support for the propositions of 
in Marjoribanks (1972), but they have pre- original study. 
sented an interesting Statistical discussion, In relation to the comment on the 
The purpose of the original study was to ciation between the environment me 
examine: the relationship between mental and differential mental abilities, a pol 
ability scores and different indicators of tially more useful statistical analysis 
the same construct, namely, home environ- been completed in collaboration with 
ment. It was not Proposed that the two types fessor Walberg of the University of Il 


the global classificatory environmental char- (1974) indicates, the results publish: 
€ » it is proposed that the Psycho- the original study support only one dim 
social parent-child interaction Variables (a) sion in common between mental abilit 

ave greater diagnostic and functional value and the environmental indicators. How 
for the educator and (b) facilitate our un- à canonical correlation analysis of the com 
derstanding of the basic nature and func- plete set of intercorrelations between m el 
tion of the menta] abilities themselves and tal abilities, social status indicators, and th 
of the characteristics of the environmental environmental force scores (see Table » 
conditions which influence menta] ability indicates a two-dimensional space that 
development. Therefore, by indicating that common between the abilities and the 
the environmental press variables are fac- vironment measures, 


Req i When canonical correlations are con 

1 uests for reprints should be sent to Kevi 1 : E 

jort se ; onmenta 
Marjoribanks, Department of Educational Studies, puted between the eig eai 


Oxford University, 15 Norham Gardens Oxford forces and the six global environmentà 
OX 2 6 py England. E indicators as a set of predictor variables, 
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e canonical loadings indicate that with 
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= global environmen 
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X] Achievement press e Verbal 


X Intellectual press 


X Activeness press 
X English press 
X Independence 
* Occupation of father 
* Reasoning 
* Education of mother 
x Ethlanguage 
* Education of father 
* Spatial 
Mother dominance 
X Father dominance 


* Birth order 
* Number of children 
* Crowding 
al and mental abilities measures. (Legend: 
t indicator, € = mental ability measure.) 


Tespect to the first canonical variate, verbal 
ànd number abilities and, to a lesser extent, 
reasoning ability, are more closely asso- 
ciated with the environmental force scores 
than is spatial ability. The high loadings on 
the first canonical variate also indicate that 
the environmental forces contribute more 
Strongly to the prediction of the abilities 
than do the social-status indicators and the 
family structure characteristics. 

After removing the variance of the first 
Canonical variate from predictors and eri- 
teria, the loadings on the second variate 

' reveal that the social status indicators and 
the environmental forces are significantly 
related to differentially developed abi 
(Walberg & Marjoribanks, 1973). It shou 
be noted that the second variate explains 


RELATION OF ENVIRONMENT TO MENTAL ABILITIES: A REPLY 


TABLE 2 


CORRELATIONS BETWEEN THE Two CANONICAL 
Prepictor VARIATES AND FOUR MENTAL 


ABILITIES 
Simple r 
Ability Corrected R | Z tores 
Variate 1| Variate 
Verbal 7299) .15** | .73*** 52.8 
Number .T0***|— .189**| .73*** 52.8 
Spatial .28***| .10 Qores. 8.1 
Reasoning | .40***| .14* Aaee* 17.5 
*p < 05. 
** p< 0l. 
** < 001. 


less variance in the mental ability scores 
than does the first variate (see Table 2y. 
High ratings on press for English, father 
occupation, press for ethlanguage, and to a 
lesser extent, press for activeness and father 
dominance, are associated on the second 
variate with high scores on verbal, reason- 
ing, and spatial abilities, but associated 
with lower number ability scores. The re- 
sults of the canonical analysis suggest the 
hypothesis that environmental forces may 
operate selectively to develop certain po- 
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tential abilities and to leave others rela- 
tively underdeveloped. 

By involving all the data from the orig- 
inal study, a canonical correlation analysis 
reveals a differential relation to the pre- 
dictor variables between number ability and 
the other three abilities, thus providing a 
different explanation of the data than that 
provided by an interbattery factor analysis. 
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PSEUDO-ORTHOGONAL AND OTHER ANALYSIS OF VARIANCE 
DESIGNS INVOLVING INDIVIDUAL-DIFFEREN CES 
VARIABLES 


LLOYD G. HUMPHREYs' AND ALLEN FLEISHMAN 
University of Illinois at Urbana-Cham paign 


In a critique of a paper by Humphreys and Dachler, Fischbach and 
‘alberg advised research workers who use individual-differences vari- 
ables in orthogonal is of variance designs to obtain equal Ns in 
cell and not to Worry about population Ns. This is thoroughly 
misleading advice and is upon an inadequate mode] of com- 
ponents of variance in individual-differences measures, On the basis of 
present analyses of the problem, it is concluded that the analysis of 


that correlational analysis has many advantages for such problems, 


cal ch and W; rg can be salvaged, when properly inter- 
preted, but other research involving this method should be discarded 
and a fresh start should be made with adequate design and methods 
of analysis, 


ee 


In their discussion of Jensen’s (1968) to variance of socioeconomic status in the 
theory of intelligence, Humphreys and two-factor analysis is substantially smaller, 
Daehler (19692, 1969b) characterized the as & function of the pattern of intercorrela- 
use by Jensen, and many others, of two or tions, than in the single-factor analysis, Jen- 
more dichotomized, individual-differences Sen, and others, have not recognized this 
variables in an analysis of Variance design fact, 


pseudo-orthogona] design. Humphreys and “ual versus Population Ns 
Dachler claimed that whenever such varia- Humphreys and Dachler (19692, 1969b) 
bles are correlated (which is quite literally contrasted the estimation of main effects in i 
always) in the Population, the use of the which means were equally weighted (the 
pseudo-orthogona] design biases the differ- pseudo-orthogonal analysis) with an analy- 
ences in means for the main effects relative sis which utilized the weighting of means by 
to the differences in those means that would estimates of the Ns in the population. Un- 
be obtained in a single-factor experiment, fortunately, however, they committed two 
For example, if measures of intelligence and errors, one major and one minor, in their 
Socioeconomic status are each dichotomized — discussion. The minor error was their failure 
and are used as the independent variables in to take into account the bias in either 
an analysis of variance of rote-learning Weighting procedure when a variable is di- 
Scores, the contributions to variance of the chotomized by the selection of extreme cases 
independent variables are changed by the only, for example, upper and lower quartiles. 
use of equal Ns in the direction expected for The major error was their failure to give 
partial correlations, Thus, the contribution sufficient emphasis to their recommendation 
ied that investigators use correlational e 
* Requests for repri ts rather than analysis of variance for in- 
G. Creed Det t Peak, ord dividual-differences variables. They. Mos 
versity of Illinois, Champaign, Illinois 61820, trasted two methods of estimating main ei- " 
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fects in order to demonstrate how conclu- 
sions differ and, in doing s0, almost lost sight 
of the main thrust of their argument, which 
was to use correlational analysis and to 
avoid the analysis of variance for such vari- 
ables. 


Fischbach and Walberg Criticisms 


Subsequently, Fischbach and Walberg 
(1971) criticized the recommendations of 
Humphreys and Dachler (19692, 1969b). 
They concluded that the procedure of 
weighting means by estimates of the Ns in 
the population was biased and that the 
pseudo-orthogonal design with equal Ns in 
the four cells produced unbiased estimates. 
These conclusions are clearly erroneous. 
While the Humphreys and Dachler esti- 
mates were biased, because the middle 50% 
of the cases had been omitted from the dis- 
tributions of both independent variables, 
that bias is unrelated to the weighting-by-N 
procedure as such. 

Because Fischbach and Walberg (1971) 
were incorrect in their analysis of the two 
designs, they also coneluded by giving in- 
vestigators thoroughly undesirable advice. 
Their concluding paragraph summarized 
their position: 


Moreover, this model also implies that when 
designing a study there is no need to be concerne 
about the relative numbers of persons in each of 
the four populations. 
cal method of gaining i 
parameters is to sample each group equall 
less, of course, there are sampling cost differen- 
tials), Thus, the question of what the joint distri- 


bution of IQ and socioeconomic status is like 
becomes an irrelevant issue [p. 80]- 


Fischbach and Walberg Errors 


Fischbach and Walberg (1971) made two 
important errors in their development: one 
of omission and one of commission. With re- 
spect to the former, they failed to note that 
the arbitrary equalization of Ns for individ- 
ual-differences variables in order to impose 
orthogonality on the independent variables 
represents only an approximation to any of 
the three least squares solutions 
by Overall and Spiegel (1969) for the prob- 
lem of unequal Ns. It is not obvious that 
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equalizing Ns experimentally is the equiva- 
lent of allowing for unequal Ns statistically 
when individual-differences data are in- 
volved. This problem, it should be noted, did 
not occur to the latter authors, since they 
were concerned with experimental data. Un- 
equal Ns in experimental data carry no psy- 
chological meaning. 

The difference between experimental data 
involving unequal Ns and individual-differ- 
ences data lies at the heart of the error of 
commission of Fischbach and Walberg. An 
orthogonal components of variance model is 
simply not an appropriate model for indi- 
vidual-differences variables, Instead, & 
model like the multiple-factor model or mul- 
tiple regression model is appropriate. The 
orthogonal components of variance model 
does not include a socioeconomic status com- 
ponent in the variance of a dependent varia- 
ble such as rote learning in à single-factor 
design in which intelligence is the single fac- 
tor. Yet, it is practically impossible to re- 
duce the socioeconomic status component 
present in the rote-learning scores to zero. 
Furthermore, an approximate reduction to 
zero variance of true scores, obtained by 
holding constant the obtained score on 80- 
cioeconomie status, distorts relationships of 
other individual-differences measures with 
the rote-learning scores. Independence can- 
not be attained experimentally for individ- 
ual-differences variables. 


THEORETICAL DISCUSSION 


Basic Argument 


The initial background for the above as- 
sertions can be obtained from a very ele- 
mentary line of reasoning. If two variables 
are not orthogonal and each is dichotomized 
at the median, the Ns in the four quadrants 
will necessarily be unequal. If one wishes to 
obtain the mean of a row or column, this is 
the equivalent of obtaining the mean of two 
samples combined. The formula is as fol- 
lows: 


NX + NoX2 
Ni +N: 


As long as the separate means are unequal, 
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the combined mean is equal to X; + X,/2 
if, and only if, N, = No. Therefore, the two 
procedures will produce different estimates 
of main effects for individual-differences 
variables. The size of the difference in esti- 
mate depends, of course, on the size of the 
correlation between the variables and the 
points of dichotomization. Investigators 
must worry about which procedure they 
use. Furthermore, when one of the two 
procedures is clearly correct by a simple 
mathematical test as above and the other is 
incorrect by that test, a more complex math- 
ematical development that seemingly re- 
verses this conclusion should be viewed with 
caution. It is not surprising to find under 
these circumstances that the more complex 
mathematics depend on an assumption that 
leads to a special, limited case. 


Interrelationships of Various Methods 


A theoretical analysis of various ways of 
handling the problem of unequal Ns by 
means of the analysis of variance is now re- 
quired. Involved are the three methods of 
Overall and Spiegel ( 1969) and the weight- 
ing-by-N procedure of Humphreys and 
Dachler (19692, 1969b). While the use of 
equal Ns in the pseudo-orthogonal design 
can be considered a fifth procedure, its pre- 
cise relationships with the other four have 
not been worked out analytically. Under fa- 
vorable circumstances with respect to choice 
of dichotomization, which will be discussed 
later, it approximates one of the least 
squares methods. 

The four analysis of variance methods are 
also related to traditional correlational 
analysis. To show this, it is helpful to use 
two correlational statistics, relating inde- 
pendent to dependent variables, derived 
from the analysis of variance. In order to 
equate these analysis of variance Statistics 
with the usual correlational analysis, how- 
ever, sample statistics rather than popula- 

tion estimates are presented in both in- 
stances. For a population estimate in the 
data which follow, the mean Square for error 
should be subtracted from the numerator 
and added to the denominator for both 
omega-squared (Hays, 1963) and epsilon- 
squared (Peters & Van Voorhis, 1940). 
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w? = sum of squares particular source 
sum of squares total 


ies sum of squares particular source 
sum of squares particular source 
+ sum of squares error 


For convenience and ease of understand- 
ing, the discussion will assume only two lev- 
els each of two independent variables and 
the omission of no middle cases from the 
analysis. The relationships can easily be 
generalized to more than two independent 
variables and to more than two levels, but 
they cannot as readily be generalized to ex- 
clusion of middle cases. The above measures 
of the degree of relationship in the sample 
derived from the analysis of variance will 
be compared with squared partial and part 
correlations computed from point biserials 
between independent and dependent varia- 
bles and phi coefficients between the inde- 
pendent variables, r 

Table 1 presents the interrelationships 
among the four analysis of variance ap- 
proaches and the relationship of each to cor- 
relational analysis. The first three methods 
are identical with those described by Over- 
all and Spiegel (1969). The fourth method 
is the weighting-by-N procedure which is 
presented in the symbols used by Overall 
and Spiegel. Definitions of various "error 
or residual terms required by the relation- 
ships with correlational analysis are given 
in Footnote a, Table 1. 

Since the reader can readily supply the 
equivalents of omega-squared and pues 
squared written in terms of the analysis 0 
variance components specified in the i 
lae presented, only the equivalent, squar 
point-biserial correlations and their deriva- 
tives appear in the, appropriate cord 
Setting off certain syrhbols by means of hit 
rentheses indicates part correlations, a 
the same expressions without the parer 
ses indicate partial correlations. The io 
bol AB indicates the interaction between ai 
independent variables, while X denotes 
dependent variable. - 4 

The omega-squared statistic requiret Va 
total sum of squares in the den 3 
all cases. Thus, except when estima! me 
population values are involved, there 


467 


ANOVA AND INDIVIDUAL DIFFERENCES 


(Sg) — Toy ‘CNU — Ton (mug — Iwi- t= Jp) “ong 
aego — To (EA 7 Too igoj — Im :(£ — "= Jp) uong 
(see's — Wiss — MT fp) Hong 
:guii9) [BNpIsed 10 10119 Zumojjoy I} 0} 19791 uum[oo sty? ut SIƏQUWNN v 
"equus quepuedep ey} sajouep X AA *ga|quuvA quepuedeput ay} uao^19q uonoe1ejut I4} sajveorput qv qoquis eurn "suon[oz109 ened 
ayeorpul sosayjaered aq) IOUT suorssa1dxo aures aq) AMA *guore[o1102 rgd sajeorput sosoqiuored jo suwu Âq sjoquids urey go 2079S 210N 


igs t—u | WL 
(gr — p/u 1 wer | (E [zz Egoy sS I av 
(ergs — DH 1 "no | (Qe sp (‘salts 1 a 
(axs — D/E I vx. | (WE vx Cosa ss I Y 
y pomo 
> iss r-u | mL 
2. gy-avxa 1 (avav ya (ego) — (fgets?) rss t av 
(avr — DAC 4 — prp T vars (8) tern [Cosa — (sass 1 a 
((ava-za — 1)/1" 34 t yxp | (9t "B [¢»):al'ss 1 Y 
£ pomo 
i x. igs r-u | mer 
= WEE ja E arem (Ce) — (Heese ass 1 av 
(er ga — D/H — DITE I vays | (96 inq (Cour — C89) ss 1 a 
(Eeg E E I avys | (9 (ege IED — C89) ss 1 Y 


e pout 


ct RN Iss pM mwmoL 
EY. E T taranga (9!) sa — (Ego tg o) ar ss 1 av 
emp ee hae (Grom) ay — (go goyaniss 1 a 
werp I Eee i [egoo — (d's) ass 1 Y 
T POPIN 
? wong | ? | vong | m | ss | fe | amos 


SISKIVNY 1VNONLVIGMNO() HiIM Hov JO NOSTUVdWOD ANV SN 1Vab3N[) HLTA 
asn uod SGOHLAW SONVIHVA ao SISATYNY sauvnüog Lsva' uno DNOKV SdIHSNOLLV'IS?[ ao AUVAKOS 


1 GISVL 


468 


problem with respect to choice of error term. 
A choice of error term is required, however, 
for epsilon-squared. The error used by Over- 
all and Spiegel (1969) does not lead to or- 
dinary correlational statistics in all cases. 
Other error terms must be substituted so 
that the correlations will be meaningful ones 
for individual-differences variables. These 
are reported in the right-hand epsilon- 
squared column. When unequal Ns carry no 
psychological meaning, however, as is true 
for experimentally controlled variables, the 
"odd-ball" correlations in the righthand 
column which use the Overall and Spiegel 
estimate of error are meaningful. For ex- 
perimental data, the operations on the stan- 
dard deviations of the dependent variable 
called for by the equations are carried out 
very simply, that is, by omission of the other 
independent variables from a subsequent ex- 
periment. 

For individual-differences variables, the 

operations specified by the equations utiliz- 
ing the minimum error term cannot be made 
operational. The results are analogous to a 
correction for attenuation, which places the 
standard deviation of true scores in the de- 
nominator of the correlation in place of the 
standard deviation of obtained scores. Mea- 
surement error variance can be more readily 
manipulated, however, than the variance in 
a dependent variable associated with indi- 
vidual-differences variables other than the 
one under study, 
_ The choice of error term is also important 
in hypothesis testing. When working with 
individual-differences variables, the same 
error terms used in computing the correla- 
tions must be used in tests of significance. 
When variables are under experimental con- 
trol, on the other hand, the unequal Ns carry 
no psychological meaning, and the minimum 
error term of Overall and Spiegel (1969) is 
appropriate. 


Advantages of Correlational Analysis 


The information in Table 1 illustrates 
very well the most important reason for 
the use of correlational analysis when deal- 
ing with individual-differences variables, 
namely, the ease and flexibility of correla- 
tional analysis. Four different methods for 
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the analysis of data involving unequal Ng 
produce different results, but these different, 
results can all be obtained quickly and eas- 
ily from only three, zero-order correlations, 
Note also that the four analysis of variance 
methods, in toto, do not exhaust all of the 
part and partial correlations one can com- 
pute. If, and only if, the independent varia- 
bles and their interaction are mutually or- 
thogonal are the values in the omega- 
squared column identical, but this identity 
does not carry over to epsilon-squared, 

The advantages of correlational analysis 
are compounded if more than three varia- 
bles are under investigation, for example, 
three independent variables and one de- 
pendent variable, The analyst can hold con- 
stant as needed any number of independent : 
variables and their interactions, while many 
analyses of variance would be required to 
obtain the same information. 

"There are other important advantages of 
correlational analysis. For one, the analysis 
of variance not infrequently produces in 
both the investigator and his audience the 
illusion that he has experimental control 
over the independent variables. Nothing 
could be more wrong (see Underwood, 1957). 
Correlational analysis is less apt to produce 
this illusion. Furthermore, in correlational 
analysis, each variable in turn can be treated 
as if it were the dependent variable. This 
also deters too ready attribution of effect to 
cause. i 

Another important advantage of using 
correlational analysis on continuous varia- 
bles is that dichotomization reduces the size 
of linear relationships. Statistical tests based 
upon continuous distributions are more pow- 
erful than those based upon dichotomized 
distributions. If regressions are linear, col 
relations among continuous variables are 
reduced by y/4/pq, where y is the e 
at the point of dichotomization and Me 
the variables are dichotomized without dis- 
card of cases. í s 

If relationships among variables m. n 5 
linear, this information is lost entire A 
the reduction of data points to two. ned i 
tion of three or four categories rather tha 

2 f hest basis 
two still provides only the roug When 
for estimating the shape of a curve. 
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curvilinear regressions cannot be corrected 
by a transformation of the units of measure- 
ment, complications are introduced into cor- 
relational analysis; but these complications 
are not solved by ignoring them. 

Possible interactions among variables can 
be handled almost as readily when working 
with continuous variables as with dichoto- 
mous ones, In a 2 X 2 table, the interac- 
tion is a direct function of the product of the 
dichotomous measures; for continuous 
scores the simple product will also ordinarily 
suffice, though more complex expressions are 
possible. 


COMPUTER SIMULATION 


It was hypothesized that dichotomization 
might introduce spurious interactions be- 
tween independent variables on the basis of 
the known effect on the shape of the regres- 
sion of a continuous dependent variable on 
a dichotomized independent variable. Com- 
puter simulation was the only feasible way 
to attack this problem. Computer simula- 
tion also clarifies the relationship between 
use of equal Ns, or unweighted means, and 
the other analytical analysis of variance 
methods. 


Design of the Simulation 


The 6 three-variable correlation matrices 
shown in Table 2 were selected so that par- 
tial correlations would be positive, zero, and 
negative; correlations between independent 
variables would be both large and small; and 
correlations between the two independent 
variables and the dependent variable would 
vary. 

Each two-way distribution was bivariate 
normal in the population, and regressions 
were linear. Statistics based on random sam- 
ples of 10,000 cases were obtained for each 
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set of population values. Therefore, sample 
statistics should closely approximate popu- 
lation parameters. In each matrix variable, 
X was designated as the dependent variable 
and the other two were dichotomized at 
—1.0, —.5, 0, +.5, and +1.0 standard score 
units. Printouts were obtained of the means 
of the dependent variable in each of the four 
cells in the 25 combinations of the two 
dichotomies for each of the six samples. 
Three analysis of variance summary tables 
were also obtained: Methods 2 and 4, pre- 
viously discussed, and the unweighted, or 
equal-weighted, means analysis. For the 
latter, 2,500 cases were assigned to each of 
the four cells, which is in line with the logic 
of the method. 


Equal N Method 


In Table 3 are shown selected data from 
the simulation. Correlations involving Vari- 
able B were selected because holding Vari- 
able A constant in relationships involving 
Variable B has & bigger effect than the re- 
verse procedure. It is seen that the statistics 
for the equal N method are almost identical 
with those for Method 2 of Overall and 
Spiegel (1969) only in the data set involv- 
ing a zero correlation between the two inde- 
pendent variables and dichotomization at 
the population means. If dichotomization 
had been at the sample medians, the results 
would have been identical. Retaining di- 
chotomization at the means but allowing 
the correlation between the independent 
variables to increase (reading from left to 
right in the table) produces an increasing 
amount of overestimation of the main effect. 

The reason for this increasing inaccuracy 
of the equal N procedure is that within- 
group variances are not homogeneous. No 
matter how N and s? for the four groups 


TABLE 2 
POPULATION ConneLATIONS UsED IN que SIMULATION 


Variable 
A 


7 


ane 


A . ` 2 A 
A 4 6 


BE 
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TABLE 3 
EPSILON rog VARIABLE B OBTAINED BY THREE 
Meruops iN Six Data Sers rog Two 
COMBINATIONS OF DICHOTOMIES 


Data set 
Method 
ia. 25 ess etos l 6 
Dichotomization at means 
Method 2 (rxs-a) |.359}.220|.212).053).074)— .221 
Equal Ns 855} .235) .220) .055).082| — .265 
Method 4 (rxs) .359) .342] .356].162/.373] .210 


Dichotomization at —1.0 and —1.0 


- 287} .208] . 195| .069) .089)— .127 
-404} .275].261| .062|.104)— .254 
.281/.283].295|.127].290| .146 


Method 2 (rxs.4) 
Equal Ns 
Method 4 (rx) 


vary, however, each variance is given equal 
weight in the pooling procedure. Further- 
more, the smaller variances are typically 
based upon the smaller Ns, so that there is 
a consistent bias downward of the estimated 
error variances. When dichotomization is at 
the means, the off-diagonal quadrants (as 
defined by the positive correlation between 
the independent variables) have smaller 
variances and smaller Ns than the diagonal 
quadrants. Thus, the main effects in the 
equal N procedure with dichotomization at 
the means are the equivalent of partial cor- 
relations computed with estimates of error 
containing a negative bias. Thus, the partial 
correlations are overestimated. 

For all levels of correlation between inde- 
pendent variables, the approximation also 
becomes more inaccurate as the point of 
dichotomization departs from the mean, al- 
though only one extreme point of dichotomi- 
zation is shown in the table. Again, the 
direction of the error is one of overestimation 
of the partial correlation. The simulation 
involved giving an extreme 15% group the 
same weight as the remaining 85%; under 
these circumstances, both the numerator 
and the denominator of the correlations are 
distorted. Note, however, that giving groups 
of this sort equal weight is well within the 
limits of practice in the use of the pseudo- 
orthogonal design. A common example that 
truly involves an underlying continuum, 
though possibly not a normal one—an ex- 
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ample which is not immediately recognized 
as such—is the comparison of equally 
weighted minority and majority groups on 
some dependent variable. 

If both the size and sign of epsilon are 
recorded in all data sets for the three meth- 
ods used in the simulation, it is quite clear 
that the equal N method provides estimates 
of main effects that approximate Method 2 
much more closely than the weighting-by- 
N procedure. The equal N procedure does 
indeed lead to estimates of main effects and 
of interactions that approximate partial cor- 
relations rather than zero-order relation- 
ships as claimed by Humphreys and Dach- 
ler (19692, 1969b), but the error in the 
approximation becomes quite large in ex- 
treme dichotomies. 

The above must not be interpreted as 
meaning that the partial correlations ap- 
proximated are necessarily those presented 
in Table 1 for Method 2. Theoretical rela- 
tionships have not been worked out. In the 
present data sets, differences between Meth- 
ods 1 and 2 are minimal because the popula- 
tion from which these samples were drawn 
had no interaction between the two inde- 
pendent variables, and interactions in the 
dichotomized samples were small even when 
nonzero. 


Nonzero Interactions Produced by 
Dichotomization 


Spurious interactions are produced by 
dichotomization in the absence of interac- 
tion between the continuous variables. These 
are quite small in Method 2, but their con- 
sistency supports the generalization. The 
zero-order interactions in Method 4 can be 
quite large, but these of course reflect in 
large part the size of main effects. 'To assess 
Method 4 interactions, it is necessary to 
compute partial correlations. 

Interactions obtained by the equal- 
weighting procedure are perhaps the only 
ones that might be considered psycho- 
logieally significant and which might be 
Statistically significant in much smaller 
samples. This is seen in Table 4, which con- 
tains the mean and maximum absolute 
values of epsilon for each data set. The 
largest means are associated with data sets 
involving the higher intercorrelations. but 
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TABLE 4 


MEANS AND Maxima or EPSILON FOR INTERACTION 
IN Six Data Sets ror ALL DICHOTOMIES 


Data set 
Item 


Method 2 
M .000 |.018 |.027 |.030 |.046 | .060 
Maximum |.000 |.041 |.055 |.049 |.076 | .092 
Equal N 
M .000 |.037 |.045 |.050 |.101 | .183 
Maximum |.011 |.106 |.084 |.090 |.180 | .336 


the heterogeneity in size of these correla- 
tions is also involved. Maximum values 
appear in combinations involving extreme 
dichotomies. 


Numerically Small, But Seemingly Large, 
Interactions 


Crossovers of the sort reported by Jensen 
(1968), in which a difference in means at 
one level of an independent variable is dif- 
ferent in sign from the difference at another 
level, can occur even though the size of the 
interaction using appropriate methodology 
is quite small. To those accustomed to the 
use of the analysis of variance with orthog- 
onal data this seems quite incongruous 
since a crossover is, under those circum- 
stances, evidence for an interaction. For 
correlated independent variables, however, 
the crossover represents rather closely the 
linear expectation, subject only to the small 
spurious interaction previously described. 

A necessary, but not a sufficient, con- 
dition for a single crossover to occur is a 
small though still positive, partial point bi- 
serial for one of the independent variables. 
It is also necessary that the more highly 
correlated independent variable be repre- 
sented by an extreme dichotomy. The cross- 
ing over is, of course, independent of 
whether an unweighted analysis or a least 
squares analysis is used, but the crossing 
over is larger and more likely to be statis- 
tically significant in the unweighted analy- 
sis. 

Double crossovers, in which the differ- 
ence in means at both levels of one of the in- 
dependent variables has the same sign but 
differs in sign with respect to the weighted 
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by N main effect, also occur. To casual ob- 
servation this suggests some sort of complex 
interaction, but it again represents the linear 
expectation with relatively little error. For 
double crossovers to occur, the partial corre- 
lation for one of the independent variables 
with the dependent variable must be nega- 
tive. 

It is not possible to summarize all of the 
relevant computer output, but examples 
have been selected of both single and double 
crossovers and are presented in Table 5, 


Discussion AND CONCLUSIONS 


There are warnings by highly respected 
authorities concerning arbitrary orthogonal- 
ization of correlated independent variables. 
Thus, MeNemar (1969) has a section (pp. 
444-449) concerning this problem, and 
Winer (1962) draws the following conclu- 
sion: 

On the other hand, should the cell frequencies be 
directly related to the size of corresponding popu- 
lation strata, then such frequencies should be used 


in estimating the mean of the population com- 
posed of such strata [p. 224]. 


This advice has not only been disregarded 
theoretically, as in the paper by Fischbach 
and Walberg (1971), but it has also been 
disregarded with increasing frequency in 
data analysis by research workers. Educa- 
tional psychology, child development, and 


TABLE 5 


Ns AND MEANS FOR SUBGROUPS AND FOR MAIN 
Errects iN Two Data Sets ILLUSTRATING 


SINGLE AND DOUBLE Crossovers 
DE Data set 5 Data set 5 
n 849| 7517| 8366) 3583) 4783) 8366 
M —.21| .52| .45  .03| .70| .45 
n 752, 882| 1634) 1379| 255) 1634 
M  |—2.41|—2.49|—2.45|—2.44 —2.51|—2.45 
n 1601| 8399 4962) 5038) 
M —1.24| .21 —.65|  .59| 
Data set 6 Data set 6 
n 775| 7650) 8425| 1247| 3704| 4951 
M .85|  .42|  .40| 1.89| 1.09| 1.29 
n 811| 764) 1575) 3753| 1296| 5049 
M  |-1.94|—3.02|—2.46|—1.06|—1.85|—1.27 
n 1586| 8414| 5000} 5000 
M =.58 11 —.33| .33 
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personality research perhaps head the list 
in frequency usage of the pseudo-orthogonal 
design, but traditional experimental fields 
are not pure in this regard. An aura of the 
laboratory and of controlled experimenta- 
tion aecompanies the use of the analysis 
of variance, while correlational analysis is 
perceived as not experimental. Every under- 
graduate, or almost every one, knows that 
correlation does not mean causation. It is 
not as well known that t ratios and F ratios 
are also measures of relationship and that it 
is the ability of the experimenter to make 
random assignments of persons to treatment 
groups that distinguishes experimental con- 
trol from statistical control. To put it 
bluntly, the analysis of variance has become 
more popular by far than it deserves, and 
not only in research on individual differ- 
ences. The preferred alternative varies from 
one set of problems to another, but for most 
research involving individual-differences 
variables, correlational analysis is strongly 
preferred to the analysis of variance. 

Some of the literature involving the use of 
the pseudo-orthogonal design can be sal- 
vaged, Interpretation of main effects as 
approximations to partial correlations rather 
than zero-order relationships is legitimate 
and useful. If dichotomization is at the 
means or medians, the approximation is 
reasonably accurate. If the dichotomies in 
the population are of grossly unequal size, 
however, the amount of error in the approxi- 
mation is substantial. 

With respect to the interactions stemming 
from the use of the pseudo-orthogonal de- 
sign, one cannot be as sanguine about sal- 
vage. These must be analyzed carefully with 
respect to possible spuriousness. With cor- 

related variables, a “crossover” does not in 
itself support the interpretation of inter- 
action. Use of products of continuous mea- 
sures in correlational analysis is legitimate 
and should be encouraged. This is especially 
true for selected products dictated by theory 
or even hunch. On a trial-and-error basis, 
on the other hand, product terms could 
easily exhaust available degrees of freedom. 

There are still other cases in which the 
literature as a whole should probably be 
discarded and a fresh start should be made. 
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Consider such designs as the one employed 
by Orn and Das (1972). Socioeconomic 
status and intelligenee were made orthog- 
onal by equating Ns. (The authors were 
engaged in a replication of some of Jensen's, 
1968, research.) In addition, however, 
groups were equated, more or less, for men- 
tal age. This, in turn, insured large differ- 
ences between IQ groups in chronological 
age. It is significant that only about 9% of 
the children surveyed could be selected for 
the experimental groups. With one depend- 
ent rote-learning variable, they replicated 
Jensen’s finding of a crossover involving 
socioeconomic status and IQ, though fail- 
ing to note that chronological age and socio- 
economic status also, of necessity, inter- 
acted. In such data, only the fact of repli- 
cability is of any importance since psycho- 
logical interpretation is impossible, or 
virtually so. Prime candidates for the re- 
search wastebasket are pseudo-orthogonal 
designs that also involve matching on still 
other individual-differences measures. 
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The Positive Reinforcement Observation Schedule was used to obtain 
data concerning expressed preference for and observed use of positive 
reinforcement in integrated classrooms taught by 30 black and 30 
white female elementary school teachers. Results indicated the follow- 
ing: (a) Black and white teachers emit virtually equal rates and types 
of reinforcers. (b) Only 32% of the combined sample actually use the 
reinforcers they state they prefer. (c) Reinforcement emission in class- 
rooms occurs at a relatively low rate. (d) Teachers reinforce opposite- 
raced children more frequently than children of their own race. (e) 
Males are reinforced more frequently than females. (f) Black females 
are the least reinforced group of all. These results are related to previ- 


ous research and implications are drawn. 


It is now well established that contingent 
reinforcement by teachers has direct effects 
on both academic and nonacademic behav- 
ior of students. The interaction of variables 
such as age (Cradler & Goodwin, 1971; Ro- 
senhan & Greenwald, 1965), sex (Rosenhan 
& Greenwald, 1965; Rucinski, 1968) intelli- 
gence (Wolfensberger, 1960), and social 
class (Cradler & Goodwin, 1971; Shores, 
1969; Zigler & Kanzer, 1962) as they influ- 
ence reinforcement effectiveness has also 
been widely studied (though with few find- 
ings that are either clear or consistent), 
Yet, Staats’ (1968) indictment that systems 
of reinforcement in school situations have 
never been subjected to systematic study 
and research remains correct. With very 
few exceptions (e.g., Meyer & Lindstrom, 
1969), observations of reinforcing behavior 
in the natural setting of the classroom are 
absent from the literature. Thus, while 
there is a genuine concern over the propri- 
ety and utility of concrete, extrinsic and 
intrinsic reinforcement as they relate to 


? Some of the data presented originally appeared 
in the senior author’s doctoral dissertation com- 
pleted at the University of Georgia. 

*Reprints may be requested from Donald N. 
Bersoff, who is now at the University of Virginia 
Law School, Charlottesville, Virginia 22902. 


children’s learning, there is a dearth of nor- 
mative data regarding the actual emission 
of the variety of reinforcers available to 
teachers. 

If normative data concerning teacher re- 
inforcement emission is scanty, such infor- 
mation is almost nonexistent for biracial 
samples of teachers. Increasing numbers of 
black and white teachers are facing inte- 
grated classrooms, yet very little empirical 
data concerning teacher-child interaction in 
such classes has been published. Reviewing 
the literature recently, Meyer and Lind- 
strom (1969) commented that observational 
studies of teacher behavior toward black 
and white children were unknown to them, 
Their own study, comparing the rates of 
approval and disapproval in biracial class- 
rooms, yielded no differences between black 
and white teachers toward classes of mixed 
race/sex children. However, only 13 teach- 
ers were studied and only 4 of those were 
black. 

Lastly, previous research (Bersoff & 
Moyer, 1973) has suggested that teachers 
have rather clear ideas concerning the kinds 
of reinforeers they believe to be the most 
effective in the classroom. What is not 
known is whether teachers actually use 
those reinforcers they state they prefer. In 
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the light of the above concerns, the present 
investigation had three major goals: . 

1. Begin a systematic inquiry into the 
reinforcement practices of teachers as they 
exhibit them under natural classroom con- 
ditions. 

2. Compare the rate of reinforcement 
emission of a sizable sample of black and 
white teachers in integrated classrooms. 

3. Compare the rates of actual reinforce- 
ment emission of teachers with their stated 
preferences. 


Mernop 


Subjects 


The subjects were 60 female elementary school 
teachers, 30 white and 30 black, from 15 schools in 
the southeastern United States. Teachers were 
selected based on their willingness to involve 
themselves in the study. The nature of the study 
was explained as the researchers’ desire to observe 
teacher-child interaction. Five white teachers and 
five black teachers from each of Grades 1 through 
6 participated. Each group had equivalent teaching 
experience (white teachers = 146 years; black 
teachers = 15.9 years) and an equal number in 
each group (9 out of 30) had had direct exposure 
to behavioral technology. 

The classes of the participating teachers included 
a total of 1,398 children. Table 1 categorizes the 
number of children by sex and race for the two 
groups of teachers. Only those classrooms with at 
least 20% of asingle race were observed. 


Instruments 


The Positive Reinforcement Observation Sched- 
ule was the instrument used to observe and record 
reinforcement emission. Extended descriptions of 
the scale, including validity and reliability data, 
have been reported elsewhere (Bersoff, 1973; 
Bersoff & Moyer, 1973). Briefly, the scale consists 
of 10 categories of behaviors that may be emitted 
by mediators (ie., teachers, parents). These be- 
haviors appear to possess reinforcing qualities that 
are both powerful and durable, but there is no 
assumption made concerning the trans-situational 
nature or generalized effectiveness of these cate- 
gories. Underlying the Positive Reinforcement Ob- 
servation Schedule is the concept that reinforce- 
ment is a relational rather than an absolute prop- 
erty of any activity. The scale may be used as a 
mediator preference scale when constructed in a 
paired-comparison format and as an observation 
schedule to obtain frequency and rate of positive 
reinforcement. The 10 positive reinforcement cate- 
gories, their symbols, and highly abbreviated def- 
initions follow :* 


* Complete definitions with behavioral examples 
may be secured from the second author. 
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Administration of Concrete Rewards, Direct 
(CRD): giving immediate concrete rewards such 
as candy, money, or free time; 

Administration of Concrete Rewards, Token 
(CRT): giving symbolic rewards redeemable for 
direct concrete rewards at some future time; 

Affirmation of Appropriate Behavior (AAB): 
verbal contact indicating that responses are COT- 
rect, acceptable, or appropriate ; 

Rapport-Praise (RP): evaluative reactions 
which go beyond the level of simple affirmation. 
Rapport-Praise communicates a positive evalua- 
tion or a warm personal reaction ; 

Positive Facial Attention (FA+): looking at 
target when mediator is smiling or attending to 
what target is doing or saying; 

Positive Physical Contact (PC+): patting, em- 
bracing, holding arm, taking hand; 

Accepts Feelings (AF): accepting or clarifying 
feeling tone of target in nonthreatening manner; 

Accepts Ideas (AD: clarifying, building, or de- 
veloping ideas suggested by target; 

Adjuvant Mastery (AM): urging, prompting, 
fostering, promoting confidence, success, providing 
encouragement for response production ; 

Aiding by Example (AE): demonstration of ap- 
propriate behavior by mediator when target is 
either nonresponsive or incorrect in exhibiting ex- 
pected response. 


Procedure 


The subjects were observed while teaching & 
lesson involving the entire class. Observers sat at 
the side or back of the classroom to minimize in- 
terference with the natural classroom environment 
and to better view any nonverbal teacher behavior. 
No observations were made on Monday mornings, 
Friday afternoons, or immediately before or after 
holidays. Observers entered the classroom approxi- 
mately one half hour before the 45-minute lesson 
began. Actual recording did not begin until the 
last half hour of the lesson. Thus, data collection 
did not begin for at least 45 minutes after the ob- 
server entered the room. The positive reinforce- 
ment behaviors which were observed and tallied 
were those corresponding to Positive Reinforce- 
ment Observation Schedule events. 

Observations were made by the senior author 
and by another trained observer, à female graduate 
student in educational psychology with prior ex- 
perience as a teacher.’ The senior author made 
609» of the observations (19 out of 30 in classes 
with black teachers and 17 out of 30 in classes with 
white teachers). 

Prior to data collection, three teachers—one 
black and two white—were observed to obtain ac- 
ceptable interjudge agreement. The two observers 
were seated on opposite sides of the room and upon 
signal, teacher behavior was recorded for 30 min- 
utes. Only those teacher behaviors which followed 


‘The authors wish to express their appreciation 
for the work of Maggie Weshner who served as 
the second observer. 
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TABLE 1 


OBSERVED USE AND EXPRESSED PREFERENCE OF Po 


SITIVE REINFORCEMENT IN BIRACIAL CLASSROOMS 


Frequency of use with children 
PROS event Nene i White Black Ceca % sed 1| Expressed 
CE mue Total 
Males |Females| Males |Females 
Black teachers (n — 30)* 
n 244 |200 (15 |120 679 
AAB 30 306 |133 93 54 585 1 .471 2 
FA+ 27 117 61 38 27 243 2 195 3 
AI 26 86 29 26 14 115 3 .125 4 
AF 22 33 15 5 " 60 4 .048 6 
AE 16 28 15 9 4 56 5 .045 9 
PC+ 15 23 13 10 8 54 6 .043 d 
RP 10 24 11 8 3 46 if 037 1 
AM 11 16 3 11 3 33 8 .027 5 
CRT 3 4 3 i 3 1 9 .009 10 
CRD 0 0 0 0 0 0 10 .000 9 
Total 637 |283 |200 |123 1243 1.000 
X positive reinforce- 
ments per child 2.61} 1.42) 1.74| 1.02 1.83 
White teachers (n = 30)> 
n 254 |227 |131 |107 719 
AAB 29 184 |125 49 88 546 1 .450 2 
FA+ 29 93 69 80 30 272 2 .224 5 
AI 22 52 53 32 15 152 3 .125 3 
AF 7 2 7 1 0 10 9 .008 6 
AE 16 |19 |14 |16 | 8 57 5 -047 7.5 
PC4- 13 25 13 6 6 50 6 041 7.5 
RP 14 22 35 13 14 84 4 .069 1 
AM 8 13 3 8 2 26 7 .021 4 
CRT 0 0201705 120). urd 0 10 .000 10 
CRD 1 ie Arte een Re 17 8 .014 9 
Total 415 ]|325 |308 |166 | 1214 1.000 
X positive reinforce- 
ments per child 1.63} 1,43) 2.35) 1.55 1.69 


Note. Abbreviations: PROS = Positive Reinforcement Observation Schedule; AAB = Affirmation 
of Appropriate Behavior; FA+ = Positive Facial Attention; AI = Accepts Ideas; AF = Accepts Feel- 
ings; AE = Aiding by Example; PC+ = Positive Physical Contact; RP = Rapport-Praise; AM = Ad- 
juvant Mastery; CRT = Administration of Concrete Rewards, Token; and CRD = Administration 


of Concrete Rewards, Direct. 

* X positive reinforcements per teacher = 41.43 
minute = 1.38. 

b X positive reinforcements per teacher 
minute = 1.35. 


appropriate student response were considered posi- 
tive reinforcements. The three observation sessions 
yielded over 150 Positive Reinforcement Obser- 
vation Schedule events. A Pearson product-mo- 
ment correlation between observers for the fre- 
quencies of the 10 Positive Reinforcement Observa- 
tion Schedule categories was calculated for each 
teacher observed. Coefficients were uniformly high 
(97 for Teachers 1 and 2; 99 for Teacher 3). Ob- 


and X positive reinforcements per teacher per 


40.46 and X positive reinforcements per teacher per 


servations for the purpose of hypotheses testing 
were conducted by single observers. However, as a 
continuing check on observer agreement, two ad- 
ditional joint observations were made during the 
data collection period. Pearson product-moment 
correlations for these two observations were .96 and 
95 (average r for all five agreement observations = 
974). 

In calculating reliability, some Positive Rein- 
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forcement Observation Schedule categories were 
not used by teachers, possibly inflating the cor- 
relation. As a more rigorous check, correlations 
were calculated excluding the categories with zero 
frequencies, Results indicated that the agreement 
between observers remained high. Recalculated 
correlations ranged from .882 to .999 (average r = 
963). 

To obtain positive reinforcement preferences, 
after the observation period was completed, each 
teacher was administered the Positive Reinforce- 
ment Observation Schedule in its paired-compari- 
son form. The teachers were instructed to deter- 
mine which of the listed positive reinforcement cat- 
egories they felt would be the most potent rein- 
forcers with children. In filling out the scale, teach- 
ers were given the following definition: Positive 
reinforcement is behavior by a teacher following 
student response for the purpose of strengthening 
or accelerating appropriate behavior.* 


ReEsuLTS 


Table 1 presents a summary of the data 
used to answer the following questions: 


1. Do black teachers and white teachers 
emit differential rates of reinforcement? 

2. Do black teachers and white teachers 
differ in the kinds of reinforcing events they 
emit? 

3. Do black teachers and white teachers 
differentially reinforce students of a certain 
sex, race, or combination of both? 

4. Do teachers emit the kinds of rein- 
forcement they express a preference for? 


Amount of Reinforcement 


Frequency of positive reinforcement 
emission was virtually equal for both races 
(black teachers, M rate — 1.38 positive re- 
inforcements per minute, SD — .456; white 
teachers, M rate — 1.35 positive reinforce- 
ments per minute, SD — .493; t — .303, df 
= 58, p > .05). When rate/child was exam- 
ined, differences remained negligible (black 
teachers — 1.83; white teachers — 1.69). 


Kinds of Reinforcement 


The types of positive reinforcements 
black teachers and white teachers used in 
the classroom were very similar. When a 
rank-order correlation for observed use be- 


* Directions and a copy of the original paired- 
comparison format are available from the second 
author. 


ROBERT BYALICK AND DONALD N. BERSOFF 


tween the two groups was calculated, the 
result was a rho of .758 (p < .01). When 
frequencies rather than ranks were con- 
sidered, the resultant Pearson-product-mo- 
ment correlation was even higher (r = .90, 
p < .001). For both groups, AAB, FA +, and 
AI were the most frequently used categories, 
comprising almost 80% of all reinforcement 
emitted in classrooms. The groups differed 
by three or more ranks for only two cate- 
gories. Black teachers accepted student 
feelings (AF) with greater frequency while 
white teachers emitted more warm evalua- 
tive verbal comment (RP). 


Reinforcement by Race of Teacher and 
Race/Sex of Child 


This question was investigated by calcu- 
lating a t test for the contrast correspond- 
ing to the triple-order interaction of race of 
teacher and race/sex of child using a proce- 
dure adapted from Wiley (1970). To adjust 
for the different numbers of male/female 
and black/white children in each class, cor- 
rected scores were used. These scores were 
obtained by dividing the number of chil- 
dren in each of the four groups (white 
males, white females, black males, black 
females) into the number of positive rein- 
forcements given to each group by a partic- 
ular teacher (e.g., 10 black males receive 20 
positive reinforcements, corrected score = 
2). To obtain the values necessary for cal- 
culating t£, the sum of corrected scores of 
white females and black males was sub- 
tracted from the sum of corrected scores of 
white males and black females. This 
method permitted the examination of the 
contrast between the differences of the var- 
ious groups. 

Using such a procedure, the overall t cal- 
eulated was 1.67 (df — 58) falling some- 
what short of the t (t = 2.00) needed for 
significance. However, the result was close 
enough to encourage examination of the in- 
teraction within each teacher group. Thus, t 
tests contrasting the differences among cor- 
rected scores of the various groups were cal- 
culated. 

Among the six contrasted groups, black 
teachers significantly reinforced one group 
in each of four pairs more often than the 
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other, while white teachers did the same for 
one group in each of two contrasted pairs. 
Specifically, the following findings resulted 
from the triple-order interaction process. 


From black teachers: 


White males received more positive re- 
inforcement than black females (t 
= 5.50, df = 29, p < .001). 

White males received more positive re- 
inforcement than white females (t 
= 4.90, df = 29, p < .001). 

White males received more positive re- 
inforcement than black males (t = 
3.00, df = 29, p « .01). 

Black males received more positive re- 
inforcement than black females (t 
= 2.50, df = 29, p < .02). 

Differences between white females and 
black males were insignificant (t 
= 1.07, df = 29). 

Differences between white females and 
black females were insignificant (t 
— 1.07, df = 29). 


From white teachers: 


^ Black males received more positive re- 

inforcement than white females (t 
= 2.10, df = 29, p < .05). 

Black males received more positive re- 
inforcement than white males (t = 
2.05, df = 29, p < .05). 

Differences between black males and 
black females were insignificant (t 
= 1.37, df = 29). 

Differences between white males and 
white females were insignificant (t 
= 1.20, df = 29). 

Differences between white females and 
black females were insignificant (£ 
= 0.87, df = 29). 

Differences between white males and 
black females were insignificant (t 
— 0.10, df — 29). 

Therefore, teachers (female) of both 
races bestowed significantly ^ greater 
amounts of positive reinforcement on male 
than on female children, though this was a 
more uniform finding among black teachers. 
Second, children of the race opposite to that 
of the teacher received more positive rein- 
forcement than did same-raced children. 
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The amount was significantly greater in 
Tour of the eight contrasts possible in this 
regard, 


Expressed Preference and Observed Use of 
Positive Reinforcement 


Rank-order correlations were calculated 
between each teacher’s observed use of posi- 
tive reinforcement in the classroom and the 
preferences she expressed on the paired- 
comparison format of the Positive Rein- 
forcement Observation Schedule. For 40% 
of the white teachers (12 out of 30), there 
was a significant positive relationship be- 
tween expressed preference and actual use 
of Positive Reinforcement Observation 
Schedule events. In contrast, 23% (7 out of 
30) of the black teachers showed such a 
significant relationship, Thus, less than 
32% of all teachers’ ranked use of positive 
reinforcement matched ranked expression of 
effectiveness. Table 2 provides the fre- 
quency distribution of rhos for each group 
of teachers. Black teachers had fewer scores 
in the range of significance and fewer scores 
approaching significance. 

By combining data, rankings were ob- 
tained for expressed preference and ob- 
served use for each racial group. Rho corre- 
lations between such rankings were caleu- 
lated. Results indicated that significant cor- 


TABLE 2 
Frequency DISTRIBUTION oF RANK-OnpER COR- 
RELATIONS BETWEEN OBSERVED Use AND EX- 
PRESSED PREFERENCE OF Positive REIN- 
FORCEMENT FOR WHITE TEACHERS AND 
BLACK TEACHERS 


Teachers 
p 
White Black 
.90-1.00 0 0 
.80-.89 3 0 
-70-.79 2 2 
.60-.69 5 5 
.50-.59 4 1 
.40-.49 3 4 
.30-.39 2 2 
.20-.29 5 5 
.10-.19 0 5 
.00-.09 1 2 
— .10-.00 5 4 


* r' = 564 (p < .05). 
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relations exist within both groups (white 
teachers, p = .718; black teachers, p = 
6.12; p < .05). Reference to Table 1 shows 
where the commonalities and disparities be- 
tween preference and use lie. While the 
group correlations are high, the individual 
correlations seem to present a more accu- 
rate and realistic view. Most teachers do 
not use the kinds of reinforcers they select 
as the most potent in accelerating children’s 
appropriate behavior. 

One further comparison provides further 
data as to the overall commonality between 
black teachers and white teachers. When 
expressed preferences are ranked and com- 
pared, the relationship between the groups 
becomes almost uniformly congruent (p = 
961). 


Discussion 


The present investigation clearly indi- 
cates that black and white teachers are vir- 
tually mirror images of each other when 
amount and kind of reinforcement emission 
is the matter of concern. These results may 
lay to rest some fears concerning the per- 
formance of black teachers and white 
teachers in integrated classrooms. Yet, 
when teacher reinforcement practices are 
viewed as a whole, there are some disheart- 
ening findings. Despite repeated evidence 
that many children, particularly those of 
low socioeconomic status, perform better 
when given material rewards (e.g., Beno- 
witz & Busse, 1970; Cradler & Goodwin, 
1971; Terrell, Durkin, & Wiesley, 1959; 
Zigler & Kanzer, 1962), both black and 
white teachers in this study gave very few 
concrete reinforcements. Only 28 CRD or 
CRT were emitted of over 2,400 observed 
Positive Reinforcement Observation Sched- 
ule events. Only 4 teachers out of 60 used 
these categories, and 1 of those 4 teachers 
accounted for 17 of the 28 events noted. 
Miniscule use of CRD and CRT matched 
expressed preference. Both groups of teach- 
ers thought that concrete rewards were the 
two least potent means for accelerating ap- 
propriate behavior in children. 

Further, teachers strongly preferred some 
of the positive reinforcement categories, al- 
though they hardly used them. For exam- 


ple, both groups ranked AM in the top five 
during the expressed preference task but 
used it less than 3% of the time (observed 
use rank, black teachers = 8, white teach- 
ers = seven). Rapport-Praise was the over- 
whelming choice as the most effective posi- 
tive reinforcement, but it was used less than 
6% of the time. Thus, while the data indi- 
cated that few differences exist between 
black teachers and white teachers in posi- 
tive reinforcement emission, it may also be 
said that teachers as a whole do not act in 
accordance with their individual prefer- 
ences. In the majority of cases, teachers did 
not use those reinforcers they thought to be 
the most potent. 

A subjective interpretation of the data 
concerning observation of positive rein- 
forcement emission in classrooms is that 
teachers are more interested in reinforcing 
academic activity in traditional ways, 
mainly through what might be called the 
“distant reinforcers” (AAB, FA+, AI) than 
through “proximity reinforcers” (CRD, 
CRT, PC+) involving material rewards 
and close personal contact. This hypothesis 
is corroborated by both previous research 
investigating expressed preferences only 
(Bersoff, 1974)® and the present study, dur- 
ing which it was noted that the majority of 
teachers maintained physical distance from 
their students and reinforced without mov- 
ing from behind their desks. 

A final disappointing note concerning re- 
inforcement emission is that on the average, 
each child received about 1.75 positive rein- 
forcements per 30-minute observation pe- 
riod. With time out for a wide variety of 
nonacademic activity, one could be gener- 
ous and assign four hours of the school day 
to academic work. Thus, each child is the 
recipient of about 14 positive reinforce- 
ments per academic day. Not enough is 
known about schedules of reinforcement in 
the acceleration and maintenance of human 
classroom learning, but this amount would 
seem to be a rather lean schedule for the 


“Further data is provided in a manuscript en- 
titled “Disparities in reinforcement preferences be- 
tween mediators and modifiers” by the second 
author which has been submitted for publication. 
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production of academie behavior. It is rea- 
sonable to assume that teachers are capable 
of reinforcing at higher rates. When one 
considers Samph's (1968) finding that 
teachers tend to inhibit criticism when 
aware of observers’ presence, the data gath- 
ered in the present study may represent the 
upper limits (without other kinds of inter- 
vention) of positive teacher behavior. 

Our main finding, that no substantial dif- 
ferences in amount and kind of positive re- 
inforcement occurred between black teach- 
ers and white teachers in the sample inves- 
tigated, supports the work of Meyer and 
Lindstrom (1969), who studied similar 
teacher behavior in Head Start classrooms. 
They concluded that “there [were] no real 
differences among teachers in the frequency 
with which they give approval ... to their 
youngsters [p. 24]." However, they also 
found that their sample of black teachers 
and white teachers distributed approval rel- 
atively evenly among the race and sex 
groupings of the children observed. That 
finding was certainly not corroborated by 
our study. 

The investigation of the effects of race 
and sex and positive reinforcement emission 
are most enlightening, revealing something 
akin to reverse prejudice. Black teachers 
reinforce white children more frequently 
than white teachers, and white teachers do 
just the opposite. Most of the differences 
are accounted for by the fact that teachers 
of both races reinforced opposite-raced boys 
significantly more than same-raced or 
same-sexed children; that is, teachers rein- 
forced males of the opposite race most fre- 
quently in comparison to other groupings. 
The subgroup receiving the next highest 
amount of positive reinforcement was males 
of the same race as the teacher. Both black 
teachers and white teachers gave more posi- 
tive reinforcement to females of the oppo- 
site race than to the same race as them- 
selves. The group least reinforced were 
black females taught by black teachers. 

The racial differences are in line with 
previous research (Brown, Payne, Lanke- 
wich, & Cornell, 1970) which found that 
teachers gave more praise and less criticism 
to children of the opposite race. Brown et 
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al. (1970) studied only fully segregated 
classrooms. However, results of the present 
study may lend themselves to developing a 
generalization that children of the race op- 
posite to that of the teacher are interacted 
with more favorably than children of the 
same race. 

The sex differences are not so readily 
comprehended. Baughman (1971) noted 
that black females outperform black males 
and closely match white males on many 
academic and intellectual measures. Thus, 
it is somewhat surprising that black females 
receive much fewer teacher reinforcements 
than any other subgroup. This, finding is 
most clearly evident in classes taught by 
black teachers. It is interesting to note in 
this regard that similar conditions are 
found in adult life where the black, female- 
headed family is at the bottom of the eco- 
nomic structure. 

In a final elaboration of the data, which 
we offer with great tentativeness, there is 
one small sign of racial biasing by black 
and white teachers. Although teachers tend 
to reinforce opposite-raced children more 
frequently, this is not true within the cate- 
gory of PC+. White teachers touch white 
children with greater relative frequency 
than black children, and black teachers 
touch black children with greater fre- 
quency. The difference is most pronounced 
with regard to black males and white males 
in groups taught by white teachers. Out of 
the pool of reinforcement received, PC+ 
was administered 6% of the time to white 
males and only 295 of the time to black 
males in classes taught by white teachers. 
It was only within the PC+ category that 
such a reversal of reinforcement frequency 
was found. 

It should be noted, in conclusion, that the 
sample of teachers observed was all female. 
None of these results may be germane to 
male elementary teachers, But it would be 
especially interesting to see if females were 
generally neglected by all teachers, or if 
this phenomenon was sex specific. In any 
case, as most teachers in the elementary 
grades are female, and assuming our find- 
ings are replicable, whatever prejudice 
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women are experiencing may be a conse- 
quence of “doing unto oneself.” 
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INCIDENTAL AND RELEVANT LEARNING WITH 


INSTRUCTIONAL OBJECTIVES! 


PHILIPPE C. DUCHASTEL? ann BOBBY R. BROWN? 
Florida State University 


It was hypothesized that one role of objectives in learning is to serve as 
orienting stimuli by which the learner can decide which material to 
concentrate on and which to pay less attention to. With a brief text 
to learn, 58 college students received either one half of the 24 objectives 
for the text or no objectives at all. As expected, the subjects with half 
of the objectives performed better than their counterparts without ob- 
jectives on the posttest items referenced to their objectives (relevant 
learning) and less well on the items not covered by their objectives 
incidental learning). These findings conflicted with previous research 
results with respect to incidental learning; this could have resulted 
from the fact that the subjects in the present study had practical ex- 


perience with an objective-referenced instructional model. 


Much research has been done and is pres- 
ently being conducted on the effects of in- 
structional objectives in learning. One aspect 
of that research has addressed the question 
of whether providing students with advance 
knowledge of the instructional objectives for 
a unit of instruction facilitates their learning 
of that unit. A review of the results obtained 
in this area which has recently been com- 
pleted (Duchastel & Merrill, 1973) pointed 
to the great variability involved in the con- 
clusions drawn from these research efforts. 

Although a number of studies have failed 
to support the hypothesis that students pro- 
vided with objectives achieve more than 
students unaware of the objectives, a suffi- 
cient number of investigations have con- 


*Much of the research reported here was con- 
ducted while the authors were at the Computer- 
Assisted Instruction Center, Florida State Univer- 
sity. The authors gratefully acknowledge the co- 
operation of D, Hansen, B. Kibler, F. J. King, 
and P. Merrill, who reviewed a draft copy of this 
paper. 

* Requests for reprints should be sent to Phi- 
lippe C. Duchastel, who is now at the Centre d’Ap- 
plication des Média Technologiques à l'Enseigne- 
ment et à la Recherche (CAMTER), Université 
du Québec à Montréal, B.P. 3050, Succursale B, 
Montréal 110, Québec, Canada. 

*Now director of the Computer-Assisted In- 
struction Center, University of Iowa, Iowa City. 


firmed the hypothesis to lead to an affirma- 
tive opinion on the question. It would there- 
fore seem appropriate to now view the issue 
on a more basie level and to investigate 
various reasons why objectives could possi- 
bly be helpful to students. The present study 
addresses one aspect of this issue, namely, 
that objectives facilitate student learning by 
providing direction for that learning. 

This directive function of objectives can 
be viewed within the general framework of 
the theory evolving around the use of orient- 
ing stimuli (Rothkopf, 1970). Basically, ori- 
enting stimuli are thought to elicit inspection 
behaviors which in turn determine what is 
learned. Orienting stimuli should, therefore, 
focus the student's attention on the im- 
portant aspects of the content (whatever is 
so defined as important) and minimize his 
attention on the incidental or illustrative 
parts of the learning material. This focusing 
effect should increase performance on test 
items referenced to the important aspects of 
the material and decrease performance on 
those items which are referenced to the inci- 
dental aspects. 

The main findings from orienting research 
were that (a) inserting questions in reading 
material enhanced performance on question- 
relevant items in the posttest and (b) per- 
formance on nonrelevant items (those not 
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referenced to inserted questions) was gen- 
erally improved through the use of questions 
placed after the learning passage but was 
not improved through the use of preques- 
tions. In some cases (Frase, 1968; Frase, 
Patrick, & Schumer, 1970; Patrick, 1968), 
prequestions actually depressed incidental 
learning. Presumably, then, questions which 
are placed before the material focus the stu- 
dent’s attention on question-relevant ma- 
terial and not on the incidental material. 

With respect to instructional objectives, 
Morse and Tillman (1972) gave half of their 
52 subjects three of the six objectives de- 
veloped for a unit of instruction. The other 
half received no objectives. Overall, the sub- 
jects receiving the partial list of objectives 
performed significantly better on the test 
items referenced to these objectives than on 
items not related to these objectives. The 
subjects receiving no objectives performed 
equally well on either set of items. Inci- 
dental learning for the group with objectives 
was not adversely affected. 

Rothkopf and Kaplan (1972) also con- 
trasted the effects of objectives on inten- 
tional and incidental learning. The experi- 
mental groups provided with objectives per- 
formed better on intentional than on inci- 
dental learning. However, they also per- 
formed better on incidental learning than a 
control group not provided with objectives 
who were simply told to learn "everything" 
in the unit. 

The preceding two studies have found that 
objectives, while enhaneing relevant learn- 
ing, do not, however, depress incidental 
learning. This finding is somewhat unex- 
pected and in conflict with results obtained 
with the use of prequestions. One possible 
explanation for these findings is that the 
subjects used in the two studies may not 
have been familiar enough with the role of 
objectives to fully use them in focusing their 
learning. It has been pointed out by a num- 
ber of researchers (cf. Tiemann, 1968) that 
the possible effects of objectives may not be 
detected in research in which the subjects 
have not fully accepted the idea that the 
posttest which they will be taking is directly 

referenced to the objectives presented to 
them. This consideration would seem to be 
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especially crucial in the issue with whieh we 
are presently dealing. If a student thinks 
that his instructor might test him on all the 
material and not just the material delimited 
by the objectives, he is likely not to focus 
his attention on the objectives as much as he 
would otherwise. 

It would seem, therefore, that the ideal 
subjects to use in research on objectives are 
students who have had practical experience 
with objectives and  criterion-referenced 
testing in their academic courses. The pur- 
pose of the present research effort was to 
investigate the incidental-relevant learning 
hypothesis with such a group. It was ex- 
pected that not only would objectives en- 
hance relevant learning but that they would 
also depress incidental learning. 


METHOD 
Subjects 


A total of 58 college students participated in 
the study. These students were volunteers from a 
communication course at Florida State University 
and received course credit for their participation.‘ 
The course in question was a mastery course or- 
ganized around a set of established objectives 
provided to the students in which each unit test 
was directly referenced to the unit objectives. The 
present study was conducted after the students had 
taken four unit tests so as to insure that the stu- 
dents were fully aware, during the experiment, of 
the role played by objectives in learning. The 
students had also had a lecture at the beginning of 
the course explaining to them how to proceed 
through the course using the objectives. 


Materials 


The instructional materials consisted of a 
slightly modified reading passage taken from a text 
entitled The Mushroom Handbook by Krieger 
(1967). The passage, which was 10 pages long (ap- 
proximately 2,400 words), was taken from the 
section entitled “Conditions Under Which Mush- 
rooms Grow and Thrive” and dealt with such 
aspects of development as food and temperature 
requirements, parasitism, fairy rings, etc. These 
materials were selected mainly because of their 
presumed unfamiliarity to the typical undergrad- 
uate student and because they seemed quite typical 
a x of the course material found at the college 
level. 

The instructional objectives used in this study 
numbered 24 and were developed from an exami- 


*The authors are grateful to Bob Kibler, M. 
Ron Basset, and M. Tom Porter, who made the 
course available for research purposes and offered 
valuable suggestions for the study. 
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nation of the passage. All objectives stated what 
the student would be expected to do once he 
finished studying the text. All of the objectives 
were very specific and were related to the knowl- 
edge category of Bloom's (1956) taxonomy of ob- 
jectives. The following 2 objectives were typical 
of those presented: [After completing this unit, 
you will be expected to] (a) give two examples of 
plants which form a cooperative symbiosis with 
fungi and (b) state the name for a plant's response 
to gravity. 

The posttest was developed so as to reflect di- 
rectly the instructional objectives. One item was 
written for each objective, for a total of 24 items. 
All items were of a constructed response format 
which tapped recall rather than recognition. The 
items referenced to the two objectives presented 
above were as follows: (a) Give two examples of 
plants which form a cooperative symbiosis with 
fungi. (b) What is the name for a plant’s response 
to gravity? 

The objectives, text, and posttest items were 
reviewed by three colleagues of the authors in or- 
der to assure that each objective was clearly 
stated and that each posttest item was directly 
referenced to its appropriate objective. Minor re- 
visions in wording resulted from this review. 


Procedure 


Subjects were randomly assigned to two treat- 
ment groups. The first group received one half of 
the objectives. These 12 objectives had previously 
been randomly selected from the full list of objec- 
tives. The second group received no objectives and 
were instructed to learn everything in the text. 

The subjects were handed an instructional pack- 
age containing general directions, the objectives 
(for the group receiving half of the objectives 
only), and the learning passage. The subjects had 
a maximum of 30 minutes in which to study the 
passage, although each subject controlled his own 
study time. During the learning task, the subjects 
were permitted to review any section of the text 
at their discretion. The subjects who had received 
objectives were further permitted to refer to them 
freely. When satisfied with his learning, each sub- 
ject indicated in a space provided on his cover 
sheet the exact time, as indicated on the room 
clock. He then individually exchanged his in- 
structional package for the posttest and additional, 
nonrelated reading material to keep him occupied 
until all subjects had completed the experiment. 
Posttest directions for the group receiving half of 
the objectives indicated to the subjects that they 
should try to answer all items, whether or not 
they were referenced to their objectives. Directions 
to the group receiving no objectives were simply 
to try to answer all items. 


RESULTS 


The data collected in this study consisted 
of posttest scores and study latencies. The 
posttest scores were partitioned into two 
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subscores: the first of these was referenced 
to the partial list of objectives received by 
the group receiving half of the objectives 
(relevant learning for that group) and the 
second subscore was referenced to the set of 
objectives not received by this group (inci- 
dental learning for that group). 

The Kuder Richardson Formula 20 relia- 
bility indices for these subscores were .68 
and .64. It is recognized that the use of in- 
structional objectives and the implications 
thereof for a criterion-referenced approach 
should lead to the use of a criterion-based 
technique such as the one proposed by Liv- 
ingston (1972). However, because there was 
only one test item per objective and because 
no percentage-type criterion was utilized, 
such a technique could not be employed. 

Statistical contrasts using analysis of var- 
iance were made between the group receiving 
half of the objectives and the group receiv- 
ing none of the objectives. These were made 
independently for each of the subscores. On 
Subscore 1, a significant effect for the avail- 
ability of objectives was obtained (F — 12.4, 
df = 1/56, p < .05, R? = 18%). Similar re- 
sults resulted on Subscore 2 (F = 23,3, df = 
1/56, p < .05, R? = 29%). From Table 1, it 
can be seen that the group receiving half of 
the objectives performed better than the 
group receiving none of the objectives on 
Subscore 1, indicating that relevant learning 
was enhanced by the availability of objec- 
tives. Indeed, the difference of 2.3 points be- 
tween the groups accounted for 18% of this 
subscore's variance. Incidental learning, on 
the other hand, was depressed, as evidenced 
by the means of 3.2 and 5.6, respectively, 
for the group receiving half of the objectives 


TABLE 1 

Mean Number Correct on THE Postrest 
Subscore 1* | Subscore 2" | Total score 

No. objectives 

per group 
M |SD|M|SD| M | SD 
Half 7.4 | 2.7 | 3.2 | 1.9 | 10.6 | 3.2 
None 5.1 | 2.1 | 5.6 | 1.9 | 10.8 | 3.3 


* Relevant learning for the group receiving half 
of the objectives. 

> Incidental learning for the group receiving 
half of the objectives. 
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and the group receiving none of the objec- 
tives. The difference of 2.4 between the 
groups accounted for 29% of Subscore 2 
variance. 

The difference between the two groups 
with respect to total score was not signifi- 
cant at the .05 level (F < 1, df = 1/56), nor 
was the difference between the time (in min- 
utes) that each group spent studying the text 
(F = 1.5, df = 1/56). For the group receiv- 
ing half of the objectives, the mean was 20 
minutes (SD = 4.8), whereas it was 18.5 
(SD = 4.5) for the group receiving none of 
the objectives. These figures represent the 
combined time involved in reading the direc- 
tions, reading and referring back to the ob- 
jectives (for the group receiving half of the 
objectives only), and studying the text. 


Discussion 


The results confirm the hypothesis elab- 
orated for this research. The possibility 
that objectives have a focusing effect on 
learning seems to be supported by the fact 
that while the two groups did not apprecia- 
bly differ either in total posttest score nor in 
study time, they did differ on each of the 
two subscores. The subjects who received 
half of the instructional objectives attained 
more of those objectives than their counter- 
parts not provided with objectives. They 
furthermore attained fewer of the nonpre- 
sented objectives than their counterparts 
without objectives. It can be implied from 
these results that they used the objectives 
provided them in order to focus their learn- 
ing on the relevant material (as perceived 
through their list of objectives) and to pay 
less attention to the incidental material 
(those parts of the material not referenced 
to their objectives). The subjects not pro- 
vided with any objectives, on the other hand, 
engaged their learning equally on all parts 
of the material. 

The results obtained in this study are in 
agreement with previous research (Morse & 
Tillman, 1972; Rothkopf & Kaplan, 1972) 
only with respect to relevant learning. In all 
three studies, objectives served to increase 
relevant learning. With respect to incidental 
learning, however, the present results con- 
flicted sharply with the previous results. 
Morse and Tillman (1972) found no signifi- 
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cant difference on incidental learning be- 
tween a group with half of the objectives and 
a group without objectives. They concluded 
that objectives did not adversely affect inci- 
dental learning. Rothkopf and Kaplan 
(1972), on the other hand, found that objec- 
tives facilitated not only relevant learning 
but incidental learning as well. 

As expressed in the introduction, these 
differences could stem directly from the fact 
that each of the three studies was actually 
dealing with a different population. In the 
Rothkopf and Kaplan (1972) report, no 
mention was made about the familiarity of 
the subjects with respect to the role of ob- 
jectives in learning. It could be presumed 
that their subjects had little familiarity with 
learning objectives. Morse and Tillman 
(1972), on the other hand, trained a subset 
of their subjects on the use of objectives 
without apparent effect. In the present 
study, the participating subjects had practi- 
cal experience in using objectives in one of 
their academic courses, an experience which 
the subjects in the other two studies pre- 
sumably did not share. We could, therefore, 
be dealing with three distinct student popu- 
lations: one having no familiarity with ob- 
jectives and no experience with them, one 
familiar with objectives but lacking experi- 
ence with them, and one with direct previous 
experience with objectives. The results ob- 
tained in each of the three studies may, 
therefore, be generalized only to their re- 
spective populations. 

A further distinction between the studies 
was the type of learning which was involved 
in each of them. While the present study, as 
well as the Rothkopf and Kaplan (1972) 
study, used objectives subsumed mainly 
under the knowledge category of learning 
(Bloom, 1956), the Morse and Tillman 
(1972) objectives related to higher levels of 
learning (mainly Bloom’s category entitled 
* Analysis"). 

Time 

With respect to time, no hypothesis had 
been advanced in the present research. 
While the group receiving half of the ob- 
jectives spent slightly more time studying 
the passage, the difference was not great 
(approximately 7% more time). 
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INCIDENTAL AND RELEVANT LEARNING 


The correlations between time and per- 
formance were also low (about .25) and 
identieal across groups. Carver (1970) 
strongly argued that the results obtained in 
orienting-stimuli research were generally 
confounded by time, since the subjects re- 
ceiving orienting stimuli usually spent 
slightly more time studying the material. 
This criticism, however, does not have any 
implication for the results of the present 
study, since what was being investigated 
was not an overall effect on performance but 
rather a differential effect on performance. 
Both groups, in fact, performed equally well 
on the total score. Furthermore, with respect 
to time, it is questionable whether the lab- 
oratory studies dealing with orienting stim- 
uli (including the present one) are repre- 
sentative of the situation involved in a 
regular academic setting. It could easily be 
expected that the effect of objectives on 
study time over an academic semester would 
be quite different from the effects obtained 
in laboratory studies of short duration. 


CONCLUSION 


The present research is seen as supportive 
of the hypothesis that objectives facilitate 
learning by both focusing the learning effort 
on relevant material and detracting atten- 
tion from incidental material. The results 
obtained, however, are directly generalizable 
only to the knowledge category of learning 
and should be replicated with other types of 
learning. 

Furthermore, the objectives utilized in 
this study were very specifie objectives, 
which would be found in typical classroom 
situations only infrequently. It would be 
useful, therefore, to replicate these results in 
a setting more representative of the regular 
academic situation. 

It should be noted also that the results 
were obtained in a situation in which ob- 
Jectives were developed from an existing 
text. One could expect different results in a 
situation in which objectives were developed 
first and then instructional materials were 
developed around the objectives. Indeed, 
much less incidental material would be pres- 
ent in such a case. 

The present study once again points to the 
requirement that researchers in the field 
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of instruetional objectives insure that their 
subjects are familiar with objectives and 
actually use them if the results of their re- 
search are to be generalizable to an appro- 
priate population. This factor was consid- 
ered as one strong reason for not finding 
expected results in some of the previous re- 
search efforts dealing with objectives 
(Duchastel & Merrill, 1973). 
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Subjects answered questions while reading the original-learning and 
interpolated-learning passages in the usual retroactive inhibition de- 
sign. In Experiment 1, the inserted questions in the original-learning 
and related interpolated-learning passages were concerned with topics 
common to both passages, but in the case of the interpolated-learning 
passage, they indicated a competing response. In Experiment 2, the 
inserted questions for each passage covered material relevant only to 
that particular passage. In both cases, the presence of questions im- 
proved retention for material directly related to the questions but 
not for incidental information, and their presence had no effect on 
the amount of retroactive inhibition. _ 


Recent studies in which the similarity 
relationship between the original-learning 
passage and the related interpolated-learn- 
ing passage has been clearly specified and 
paralleled in the test items of the original- 
learning test have demonstrated that inter- 
ference effects do operate in the retention 
of connected discourse (Anderson & My- 
row, 1971; Crouse, 1971). Significant retro- 
active inhibition was obtained both when 
there was no measure of either original 
learning or interpolated learning prior to re- 
call and when measures of both original 
learning and interpolated learning were 
made prior to recall (Crouse, 1971). Elimi- 
nation of output had no effect on the 
amount of retroactive inhibition, but this 
comparison was cross-experimental and had 
different time intervals between original 
learning and recall. The presence of an im- 
mediate test after original learning but not 
after interpolated learning substantially 
improved delayed retention and sharply re- 
duced interference effects (Anderson & My- 
row, 1971). However, given that the level 
of original learning is increased by the 
presence of an immediate test even when 


1 Requests for reprints should be sent to Basil 
S. Walker, Student Counseling and Research Unit, 
University of New South Wales, P.O. Box 1, 
Kensington N.S.W. 2033, Australia. 


no knowledge of results is provided (Rode- 
rick & Anderson, 1968), the reduction of 
interference is predictable from the rela- 
tionship between the degree of original 
learning and that of interference ( Slamecka 
& Ceraso, 1960) without any assumption of 
distinct test effects. 

That the insertion of questions into both 
original-learning and related interpolated- 
learning passages might affect the amount 
of retroactive inhibition is suggested by the 
finding that the presence of test-like events 
in prose learning goes beyond direct-in- 
structive effects and produces change in 
performance on reading material not di- 
reetly related to the text components on 
which the test items were based (Rothkopf, 
1970). If inserted questions are interpreted 
as having a review- rather than a “for- 
ward"-shaping function, then requiring sub- 
jects to answer questions during both 
original-learning and interpolated-learning 
passages may promote the process of "in- 
tegrative reconciliation" (Ausubel, 1963) 
and thus reduce interference. 

The purpose of the present study was to 
determine whether the amount of retroactive 
inhibition in prose learning would be af- 
fected by the presence of inserted questions 
in both original-learning and interpolated- 
learning passages. 
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QUESTIONS AND RETROACTIVE INHIBITION 


EXPERIMENT 1 


Method 


Passages and tests. The original-learning and 

related interpolated-learning passages, each describ- 

| ing a fictitious African tribe, were those used 

| previously by Anderson and Myrow (1971)? Both 

passages covered the same topics, and parallel in- 

formation was divided into three categories: same, 

"^ unrelated, and conflieting. There were approxi- 

mately equal amounts of the three types of infor- 

mation. The criterion test on the original-learning 

passage consisted of a 30-item, short-answer test 

and the same 30 items in a multiple-choice format. 

There were three types of test items corresponding 

to the three categories of information. Facilitation 

items contained an item stem which was repre- 

sented in substantially the same form in both the 

original-learning and related interpolated-learning 

passages and which would be answered in the same 

"i W way on the basis of either passage. Neutral items 

concerned material discussed only in the original- 

learning passage. Interference item stems were rep- 

resented in both passages, but the responses 

associated with the items were different in the two 

passages. All of the interference items in the mul- 

tiple-choice test included the specific competing 

response from the related interpolated-learning 

passage as one of the distractors. The multiple- 

choice test was corrected for guessing using the 

| formula R — W/3, and the short-answer test was 

* scored on the basis of a predetermined key which 
specified the limits of acceptable answers. 

The unrelated interpolated-learning material 
was a passage on pestilence. The original-learning, 
related interpolated-learning, and unrelated inter- 

| polated-learning passages contained 2,240, 2,190, 
and 2,475 words, respectively. 

Each learning passage was divided into three 
approximately equal sections. Each section within 
the original-learning passage and the related in- 
terpolated-learning passage contained roughly the 

| same proportion of same, unrelated, and conflicting 

| information. Questions, typed on a separate page 
and in short-answer form, were inserted after each 

section; four questions after Section 1 and three 
questions after Sections 2 and 3. The 10 interfer- 

ence items from the original-learning test served 

as inserted questions for the original-learning and 

related interpolated-learning passages, and 10 new 
questions were prepared for the unrelated inter- 

| polated-learning passage. Although the same ques- 
ma, tions appeared in the original-learning and related 
interpolated-learning passages, the set of questions 


| after each section was different because similar 
? topics were not covered in corresponding sections 
| in the two passages. 

of variance with repeated measures on the last two 


?'The author is indebted to R. C. Anderson for 


| Design. The design was a 2 X 2X 2 X 3 analysis 
| permission to use learning passages and tests. 


487 


factors. The between-subjects factors were presence 
or absence of inserted questions and type of inter- 
polated-learning passage (related or unrelated). 
The within-subjects factors were response mode 
(short answer and multiple choice) and item type 
(facilitation, neutral, and interference). 

Procedure. Both original- and interpolated- 
learning passages, with or without inserted ques- 
tions, were placed in a single booklet and read in 
a single study session. Instructions preceding the 
passages told the subjects that they were to read 
carefully, rereading any difficult parts, but that 
they were not to turn back after having completed 
a page. They were informed that they would be 
required to answer questions at various points in 
the material or, in the absence of inserted ques- 
tions, that they would be examined in the material 
at some later time. At the end of the last page 
of each section and at the bottom of each page of 
questions, the subject recorded, from a stop watch, 
the time taken to read that section or complete 
that page of questions. There was no time limit on 
reading or answering questions. 

Seven days after their study session, the subjects 
took the short-answer and multiple-choice original- 
learning tests. Ten minutes was allowed for each 
form of the test. 

Subjects. The subjects were 80 first-year under- 
graduate arts students (44 males and 36 females) 
who volunteered for participation in the study. 
Subjects were randomly assigned to the four ex- 
perimental conditions with the restriction that the 
ratio of males to females in each condition be the 
same. The subjects were mostly run in small groups 
of from 6 to 10, with some subjects being done 
individually. 


Results 


Original learning. In terms of the time 
taken to read the original-learning passage, 
there was no significant difference for the 
main effects of inserted questions and type 
of interpolated-learning passage nor signifi- 
cant interaction between these effects (in 
all cases, F < 1.00, df = 1/76). Considering 
only the two groups who had inserted ques- 
tions, there was no significant difference in 
the percentage of correct responses given to 
the inserted questions in the original-learn- 
ing passage. The mean percentage correct 
for the unrelated interpolated-learning 
group was 93.0% and for the related inter- 
polated-learning group, 93.5% (F < 1.00, 
df = 1/38). 

Delayed recall, The mean percentages 
correct on parts of the delayed test for each 
experimental group appear in Table 1. 

There was no overall effect of interpo- 
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Group 


Related interpolated learning 
No inserted questions 
Inserted questions 

Unrelated interpolated learning 
No inserted questions 
Inserted questions 


TABLE 1 
MEAN PERCENTAGE Correct ON THE DELAYED OrtGINAL-Learnina TEST 
Item 
Short answer Multiple choice 
Ld " 
Fagiita- | entzat | Intertet- | Faglia- | wur | terler- 
20 46.0 | 45.0 | 39.0 | 60.7 62.0 47.8 
20 43.5 | 48.0 | 67.5 63.0 | 60.5 66.7 
20 35.0 | 52.5 52.0 53.3 62.2 67.8 
20 42.0 47.5 76.5 55.7 | 60.2 84.7 


lated learning, but the Interpolated-Learn- 
ing Passage X Item Type interaction was 
significant (F = 16.81, df = 2/152, p < 
01). There was an overall effect of inserted 
questions (F = 7.83, df = 1/76, p < 01), 
but the Interpolated-Learning Passage X 
Item Type x Inserted Questions interac- 
tion, represented in Figure 1, was not sig- 
nificant. The interaction of interpolated- 
learning passage and item type conforms to 
the interference model; the expected retro- 
active inhibition on the response-different 


PERCENTAGE CORRECT 


UNRELATED IL, NO IQ 
RELATED IL, NOIQ 
UNRELATED IL, IQ 
RELATED IL, IQ 


FACILITATING NEUTRAL 


ITEM TYPE 
Ficure 1. Percentage correct in Experiment 1 
on the delayed original-learning test as a function of 
the interpolated-learning passage (IL), the pres- 
ence or absence of inserted questions (IQ), and the 
item type. 


INTERFERING 


items and facilitation on the response-same 
items was obtained. The presence of in- 
serted questions increased retention of in: 
terference items by nearly the same amount 
in the related interpolated-learning and un- 
related interpolated-learning conditions 
and, except for small nonsignificant changes 
in the unrelated interpolated-learning con- 
dition (see Figure 1), had no effects on 
facilitation or neutral items. 

Neither the Interpolated-Learning Pas- 
sage X Response Mode x Item Type inter- 
action nor the Inserted Questions X Re- 
sponse Mode x Item Type interaction was 
significant, but retroactive inhibition was 
much more marked on the multiple-choice 
than on the short-answer portion of the 
delayed test. 


EXPERIMENT 2 


Since in Experiment 1 the interference 
items from the original-learning test served 
as inserted questions in the original-learn- 
ing and related interpolated-learning pas- 
sages, any nondirect-instructive effects of 
the inserted questions on retroactive inhibi- 
tion was confounded with direct-instruetive 
effects. In Experiment 2, the questions in- 
serted into the original-learning, related in- 
terpolated-learning, or unrelated interpo- 
lated-learning passages were concerned 
with information present only in that par- 
ticular passage. 


Method 


Instead of the 10 interference items from 
original-learning test, the 10 neutral items 


d QUESTIONS AND RETROACTIVE INHIBITION 


inserted into the original-learning passage. Ten 
new questions, concerned with information con- 
tained in the related interpolated-learning passage 
only, were prepared and inserted into that passage. 
In all other respects, the materials and designs were 
the same as in Experiment 1. 

Subjects and procedure. The subjects were male, 
high school students who were in Grade 1l. 
Seventy-four subjects were present for both the 
study session and the delayed-recall test. Sub- 
jects within each of the two Grade 11 classes in- 
volved in the study were randomly assigned to 
one of the four experimental conditions by passing 
out material stacked in a random order. Except 
for the fact that the time taken to read each sec- 
tion of each passage and to answer the inserted 
questions was not recorded, the rest of the proce- 
dure was the same as in Experiment 1. 


Results 


The significant effects were the same as 
in Experiment 1. The related interpolated- 
learning passage produced retroactive in- 
hibition (F = 26.37, df = 2/140, p < 01) 
for the Interpolated-Learning Passage X 
Item Type interaction but, again, the In- 
terpolated-Learning Passage X Inserted 
Questions Item Type interaction, shown in 


Figure 2, was not significant. 
had 


Discussion 

Although the presence of inserted ques- 
tions improved delayed recall, they had no 
affect on the amount of retroactive inhibi- 
tion, and the improvement in delayed recall 
was restricted to the material on which the 
inserted questions were based. 

In further confirmation of earlier studies 
(Anderson & Myrow, 1971; Myrow & An- 
derson, 1972), retroactive inhibition was 
attributable to response competition rather 
than to response unavailability. Rather 
than retroactive inhibition being greater on 
the short-answer than on the multiple- 
choice test, as required by the response 
availability hypothesis, it was greater on 
the multiple-choice test, although not sig- 
nificantly so. Moreover, analysis of errors 
on the multiple-choice interference items 
showed that for the related interpolated- 
learning groups, 60.0% of the errors in the 
no-inserted-questions conditions and 68.0% 
of the errors in the inserted-questions condi- 
tion resulted from choosing the specific 
a competing interpolated-learning passage 
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CORRECT 


UNRELATED IL, NOIQ 


PERCENTAGE 


RELATED IL, NO IQ 
UNRELATED IL, IQ 
RELATED IL, Q 


FACILITATING NEUTRAL INTERFERING 


ITEM TYPE 


Ficure 2. Percentage correct in Experiment 2 
on the delayed original-learning test as a function 
of the interpolated-learning passage (IL), the 
presence or absence of inserted questions (IQ), and 
the item type. 


distractor. The corresponding figures for 
the unrelated interpolated-learning groups 
were 21.7% and 17.8%. 

The improvement in performance in de- 
layed recall as a result of the presence of 
inserted questions is attributed to the 
strengthening of specific associative con- 
nections. In the Anderson and Myrow 
(1971) study, it was concluded that com- 
pleting an immediate test primarily influ- 
ences response availability rather than spe- 
cific associative connections because the 
immediate test differentially affected per- 
formance on short-answer and multiple- 
choice tests. However, in the Anderson and 
Myrow study, the immediate test included 
multiple-choice items, whereas in the pres- 
ent study the inserted questions were in a 
short-answer format and thus could only 
affect items for which responses were al- 
ready available. 

The increase in performance due to the 
presence of short-answer inserted questions 
was substantial, but it is possible that fur- 
ther improvement could have been obtained 
by including multiple-choice items. How- 
ever, if the multiple-choice questions were 
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inserted into both original-learning and in- 
terpolated-learning passages, there should 
be no change in the amount of retroactive 
inhibition, since this was attributed to re- 
sponse competition rather than to response 
unavailability. 

The tendency for subjects to spend less 
time studying the interpolated-learning pas- 
sage than studying the original-learning 
passage (Anderson & Myrow, 1971) is 
likely to reduce interference effects. The 
presence of inserted questions in both pas- 
sages did not reduce this tendency. In both 
the inserted-questions and the no-inserted- 
questions conditions, the interpolated-learn- 
ing passage was read at a greater number of 
words per minute then was the original- 
learning passage. However, the finding in 
this study that there is no interaction be- 
tween test effects and interference effects 
means that tests can be used to determine 
the extent to which the level of learning in 
the original-learning and interpolated-learn- 
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ing passages differ, with no fear of confound- 
ing test effects and interference effects. 
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THE ALGORITHMIC APPROACH TO CURRICULUM 


CONSTRUCTION: 
A FIELD TEST IN MATHEMATICS! 
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This study constitutes a field test of an algorithmic approach to cur- 
riculum construction. First the method was used successfully to charac- 
terize the content inherent in a mathematics textbook in terms of rules 
(algorithms). The use of higher-order rules allowed a 40% reduction in 
total rules, Two rule-based curricula were compared experimentally. 
The discrete (D) rules curriculum consisted of all 303 (lower-order) 
rules, The higher-order (H) curriculum included 169 of these rules 
plus 5 higher-order rules. The Curriculum H subjects performed as well 
with the eliminated rules as did the Curriculum D subjects who learned 
the rules directly, and they performed significantly better on new tasks 
beyond the scope of either curriculum, In short, Curriculum H subjects 
were taught leas but learned more. 
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Curriculum construction has traditionally 
been an artistic endeavor. Even today, the 
vast majority of texts and new curricula are 

» developed almost exclusively on the basis of 
the curriculum constructor’s subject-matter 
knowledge and professional know-how. 

During the 1960s, a strong technological 
counterforce developed under the leadership 
of behavioral scientists. The basic position 
taken was that objectives must be stated in 
behavioral (operational) terms so as to 
make it possible to determine, through test- 

== ing, whether learners have achieved in- 
dividual objectives. As a result, a healthy 
debate developed between proponents of 
behavioral objectives (e.g, Gagné, 1970; 
Lipson, 1967; Mager, 1962; Popham, 1969; 
Tyler, 1964) and others who raised cautions 
concerning their use (e.g., Atkin, 1968; Ebel, 
1970; Eisner, 1967). 
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In recent publications, the positions taken 
have become increasingly more flexible (e.g., 
Glaser, 1973; Resnick, 1972; MacDonald- 
Ross, 1973). A statement by Scandura 
(1971) summarizes much of the current 
view: “It is felt that complete reliance on 
operationally defined objectives has led 
some to fragmented curricula, curricula 
based on discrete bits of knowledge [p. 4]." 

Elaborating on this view, Scandura 
(1972) has identified two basic inadequacies 
of the behavioral objectives approach con- 
sidered in its simplest form: (a) The ap- 
proach deals only with observable behavior 
and says little about how that, behavior is 
to be generated, and (b) it provides no sys- 
tematic way of dealing with interrelation- 
ships, among the identified objectives, or 
equivalently, of building transfer into a 
curriculum. 

Regarding Point a, the distinction be- 
tween the behavior of a subject and the 
knowledge (rule) that makes that behavior 
possible is fundamental. It can easily be 
proven mathematically that if just one rule 
exists for generating a class of behaviors, 
then there is an infinite number of other 
rules that will do the same thing (e.g., see 
Rogers, 1967). This fact is important in 
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curriculum planning because in practice 
there is almost always more than one viable 
way of approaching a task. The subtraction 
methods of borrowing and equal additions, 
for example, are both widely used. 

Regarding Point b, it is clearly an 
impossible task, with any but the most 
trivial curricula, to explicitly teach the 
learner all that the curriculum con- 
structor wants him to know. The limitations 
imposed by time and the capacity of the 
learner to absorb and retain information 
make this impractical. Some attention to 
interrelationships would seem almost essen- 
tial. 

One approach to this problem is based on 
learning hierarchies (e.g, Gagné, 1970; 
Resnick, Wang, & Kaplan, 1970). As is well 
known, this approach makes use of task 
analysis (e.g., Miller, 1962) as a means of 
determining subordinate tasks. Subordinate 
tasks are prerequisite to so-called Higher 
Order 1 tasks in the sense that transfer to 
Higher Order 1 tasks frequently occurs once 
all of the prerequisites are learned. [It may 
be noted parenthetically that implicit in 
any specific task analysis is a specific 
underlying rule (which is not necessarily, 
and often is not, explicit in the analysts 
mind). Since different rules may underlie 
the same task (as indicated above) it fol- 
lows that any given task, theoretically 
speaking, may be task analyzed in any 
number of different ways (e.g., consider task 
analyses of subtraction based on borrowing 
and equal additions).] 

In the present study, we have adopted a 
second (algorithmic) approach to the prob- 
lem of transfer (Seandura, 1972). This ap- 
proach bears some relationship to learning 
hierarchies, but it includes an important 
conceptual generalization that has often 
gone undetected because of the common use 
of the descriptor “higher order.” 

This approach is based on a recent theory 
of structural learning (Seandura, 1973) in 
which Higher Order 2 rules may operate on 
other rules (e.g., subordinate ones) to gen- 
erate what in task analysis are Higher 

Order 1 rules. [Mathematically speaking, 
the distinction is precisely that between a 
function (Higher Order 2) and a function 
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value or output (Higher Order 1).] Unlike 
the “directions,” which are sometimes neces- 
sary in moving from one level in a hierarchy 

to another, Higher Order 2 rules operate on 
classes of rules and are not limited to rules 

in particular hierarchies. Higher Order 2 
rules may operate between various levels 
of any one of a class of hierarchies, in- 
cluding hierarchies that appear superficially + 
quite different. Consider, for example, the 
following two, simple Higher Order 1 tasks: 
(a) Given a certain number of yards, find 
the equivalent number of inches; and (b) 
Given an airplane (on the ground), get it 
up in the air and then back down (safely). 
One way to analyze these tasks is to break 
each task into component parts: (a) one 
subordinate task for converting yards into 
feet and another for feet into inches and 
(b) subordinate tasks for “taking off" and 
“landing.” Although they involve quite dif- 
ferent rules, the hierarchies have a common 
structure and, in particular, the Higher 
Order 1 tasks (more exactly, the rules for 
solving the indicated Higher Order 1 tasks) 
can be generated from the respective sub- 
ordinate task (rules) by applying a Higher 
Order 2 composition rule (for details, see 
Scandura, 1973b, pp. 213-218). This 
Higher Order 2 composition rule would be 
reflected in typical learning hierarchies by 
parallel but different “instructions.” In ef- 
fect, a single, Higher Order 2 rule can take 
the place of a potentially infinite class of 
separate “instructions.” 

Further, Higher Order 2 rules are not 
limited in their application to traditional 
hierarchies. For example, given a rule for 
converting inches to centimeters (1 inch = 
2.54 centimeters), it is a simple matter to 
envision a Higher Order 2 inverse rule which 
applies to such rules and generates their 
inverses. In this case, the output would be 
a rule for converting centimeters to inches 
(1 centimeter = inch/2.54). For a discus- 
sion of these and other differences between 
Higher Order 1 and Higher Order 2 rules, 
see Scandura (1973a). In this article, all 
subsequent references to “higher order” are 
of Type 2. 

The algorithmic approach to curriculum 
construction is based on these ideas and dis- 
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tinctions. Specifically, the approach builds 
on the notion that each behavioral objective 
corresponds directly to a class of tasks that 
can be computed (solved) by applying a 
rule or algorithm. It is further assumed that 
curricula (ie. what is to be learned) can 
be represented in terms of finite sets of 
rules, including higher-order rules that 
operate on rules. In effect, the task of 
curriculum construction may be viewed as 
one of identifying a finite set of rules that 
provides an efficient account of the desired 
behaviors. The algorithmic approach is 
basically a method for devising curricula 
based on behavioral objectives and char- 
acterized in terms of rules and higher-order 
rules. 

The first step in this method is to select 
text materials to analyze. This, of course, 
involves making value judgments concern- 
ing the type of material to be considered. All 
of the tasks implicit in the text material are 
then identified and stated as behavioral 
objectives. Next, rules are written for solv- 
ing each of the tasks, and parallels among 
these rules are identified. Such parallels are 
indicative of common structure and provide 
a basis for devising higher-order rules. 
Finally, those rules that are derivable by 
application of the higher-order rules to 
other rules in the characterizing set may be 
eliminated. 

To summarize, the algorithmic approach 
provides a potential basis for overcoming 
the two, aforementioned major limitations 
of the simple behavioral objectives ap- 
proach, First, it makes specific what the 
subject must learn in order to demonstrate 
mastery on behavioral objectives and, thus, 
might provide a viable basis for instruction. 
Second, it makes explicit provision for the 
inclusion of higher-order relationships 
among objectives. Indeed, it provides a 
systematic way for possibly building trans- 
fer potential into a curriculum. 

Two studies are reported. The first study 
was strictly analytical in nature and was 
designed to determine the general feasibility 
of the algorithmic approach. Specifically, 
we wanted to determine the practicality of 
characterizing the knowledge inherent in 
a given mathematics text in terms of a 
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finite set of rules, including higher-order 
rules. 

The second study was contingent upon 
the success of Study 1. The purpose of this 
study was to determine (a) whether making 
rules explicit provides a viable basis for 
instruction in the classroom, and (b) 
whether the introduction of higher-order 
rules provides an adequate basis for im- 
proving the ability of students to transfer. 


Srupy 1 


Feasibility of the Algorithmic Approach 
to Curriculum Construction 


In order to judge the feasibility of the 
approach, the following two criteria were 
established: 

1. A subjective appraisal of (a) the ease 
with which the tasks (behavioral objec- 
tives) inherent in the given text material 
could be identified, (b) the ease with which 
rules associated with each of the respective 
tasks could be written, and (c) the extent to 
which the tasks and rules identified were 
compatible with the approach taken in the 
text. 

2. The extent and ease with which the 
higher-order rules inherent in the text could 
be (a) identified and (b) used to eliminate 
those rules (and their corresponding tasks) 
that were derivable by application of the 
higher-order rules to other identified rules. 

Consideration was also given to the sheer 
numbers of tasks and rules involved and 
particularly to the extent these numbers 
could be reduced by the introduction of 
higher-order rules. 


Method 


Part 3 of Mathematics: Concrete Behavioral 
Foundations (Scandura, (1971) was chosen for 
analysis. (This book was later analyzed in its 
entirety and was published as a workbook, Scan- 
dura, Durnin, Ehrenpreis, Luger, 1971.) 

The first step was to identify the individual 
tasks (behavioral objectives) inherent in the text. 
This was accomplished by going through the text 
paragraph by paragraph and asking what perform- 
ance capabilities might reasonably be expected of a 
student who had studied the material. For ex- 
ample, the following tasks were identified in 
Mathematics: Concrete Behavioral Foundations 
on pages 182 and 191, respectively. 
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Task A. Given a whole number m, determine 
whether or not z is an additive identity for m. | 

Task B. Given a whole number m, determine 
whether or not y is a multiplicative identity for 
m. 


The second step was to identify and eliminate 
redundancies in the tasks identified. Very few 
(less than 8) such redundancies were found in the 
text aside from prerequisites to other tasks. Pre- 
requisite tasks were not eliminated because it was 
felt desirable to maintain the original sequencing 
of ideas in the text. 

Third, one efficient rule was constructed for 
each task. The rules were written so as to be com- 
patible with the text materials. The rules written 
for the illustrative Tasks A and B were as follows: 


Rule A. Find the sum m + z and then the sum 
z+m.Ifm+zr=xz+ m= m, then zis an addi- 
tive identity for m; if m +z * mor z + m »* m, 
then z is not an additive identity for m. 

Rule B. Find the product m X y and then the 
product y X m. If m X y — y X m = m, then y is 
a multiplicative identity for m; if m X y * m or 
5 X m ¥ m, then y is not a multiplicative identity 

or m. 


The fourth step was to look for higher-order 
relationships among the rules. These relationships 
were stated as tasks to be performed, and higher- 
order rules underlying these (higher order) tasks 
were then constructed. For example, Rules A and 
B are obviously related. The nature of this rela- 
tionship can be illustrated by the following task 
and its underlying higher-order rule. 


, Task H. Give a rule for demonstrating that a 
given set of numbers provides an instance of a 
property (eg, commutativity) under some opera- 
tion, generate a corresponding rule involving 
another operation. 


Rule H. In the given rule, replace the original 
operation by the new operation and any "special" 
element (e.g, the identity) by its counterpart. 


Fifth, the higher-order rules identified in Step 
4 were used to eliminate those tasks and corre- 
sponding rules that are derivable by application of 
the higher-order rules to other rules identified. 
Thus, for example, Rule B was eliminated inas- 
much as it could be generated by applying Rule 
H to Rule A, 


Results and Discussion 


The chapters analyzed lent themselves 
very naturally to a task-rule type of anal- 
ysis. The major requirements found neces- 
sary for such analysis were a thorough 
familiarity with the subject matter and a 
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good working knowledge of the algorithmic 
approach. 

Upon completion of Step 3, a list of 303 
tasks and their corresponding rules (one 
rule for each task) had been identified. A 
list containing 174 rules was obtained upon 
completion of Step 5. Five of these were 
higher-order rules acting on rules concerned 
with (a) ordered sets (ordinal numbers) 
and unordered sets (cardinal numbers), (b) 
well-defined operations, (c) properties of 
number systems (e.g., commutativity), (d) 
inverse operations, and (e) particulariza- 
tion (i.e., assigning values to variables). The 
first higher-order rule made it possible to 
eliminate 4 rules; the second, 17; the third, 
84; the fourth, 7; and the fifth, 22. 

Although the process of identifying the 
“tasks” inherent in the text was time-con- 
suming, it was felt that the list of 303 tasks 
gave almost complete coverage of the mate- 
rial. Once the tasks were identified, there 
was little difficulty encountered in writing 
rules compatible with the approach taken 
in the text. This was partly due to the fact 
that the narrative and illustrative examples 
provided adequate guidance. The identifi- 
cation of the higher-order rules and the 
subsequent reduction of the list of 303 rules 
to the final list of 174 rules was the most 
difficult step. Some of the higher-order rules 
(e.g., ordered sets and unordered sets) were 
easier to identify than others (e.g., particu- 
larization), and the analysis was pursued 
just far enough to demonstrate the feasi- 
bility of the approach. Doubtless, a more 
intensive analysis would have resulted in 
a larger number and variety of higher-order 
rules. Overall, based on the criteria estab- 
lished, it was concluded that the algorithmic 
approach was feasible. 


Srupy 2 


Evaluation of the Algorithmic Curriculum 


The second study, in contrast to the first, 
was experimental and involved a compari- 
son of two rule-based curricula. The first 
curriculum (D) was characterized in terms 
of a list of discrete tasks and rules for solv- 
ing these tasks, one rule for each task. The 
second curriculum (H) included the higher- 
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order rules and all Curriculum D tasks and 
rules except those derivable by application 
of the higher-order rules to other rules in 
Curriculum D (as described in Step 5, 
Study 1). 

The degree of learning evidenced by stu- 
dents trained in Curriculum D provided 
evidence concerning the hypothesis that 
making rules explicit provides a viable 
basis for classroom instruction. This hy- 
pothesis was tested directly in terms of 
mastery rather than by comparison with 
some arbitrarily defined control. 

Comparative performance of the stu- 
dents trained in Curricula D and H per- 
tained to the question of transfer. We hy- 
pothesized that students trained in the 
higher-order rules (Curriculum H) would 
perform on the Curriculum D tasks that had 
been eliminated from Curriculum H as well 
as the subjects who had been trained on the 
Curriculum D tasks. This was expected (ac- 
cording to the theory) because these tasks 
could be solved by using higher-order rules 
to derive solution rules for these tasks. 
Furthermore, we predicted that the Curric- 
ulum H students would perform signifi- 
cantly better on tasks not in Curriculum 
D that could be solved in the same way (i.e. 
by use of the higher-order rules). No dif- 
ference in performance was expected on 
tasks included in both curricula. 


Method 


Materials. The experimental materials were 
based directly on the analyses reported in Study 
1. The discrete rules curriculum (D) resulted upon 
application of Steps 1-3 of the algorithmic ap- 
proach. The higher-order rules curriculum Œ) in- 
cluded Steps 4-5 as well. Curriculum D consisted 
of 303 tasks and rules, and Curriculum H of 174 
task and rules including 5 higher-order tasks and 
rules. Although experimental materials were pre- 
pared for Chapters 5-9 of Mathematics: Concrete 
Behavioral Foundations, only those materials per- 
taining to Chapters 5, 6, and 7 were actually used 
in the experiment. 

The two curricula were reproduced in the form 
of workbooks with the following format: First, a 
statement of the task. Second, a rule statement 
in simple terms. Next, from three to five worked 
examples (depending on the experimenter’s judg- 
ment as to task difficulty), and finally, 10 ex- 
ercises, In addition, a set of task exercises for re- 
view purposes was selected on the basis of the 
apparent difficulty subjects encountered during 
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learning. All higher-order tasks were included in the 
Curriculum H review. 

Pre- and posttests were also constructed. The 
pretest consisted of a sample of 40 exercises (stated 
exactly as they appeared in the workbooks) which 
(a) tested applications of rules found in both treat- 
ments, and (b) in the judgment of the experi- 
menter, were most likely to discriminate among 
subjects. The posttest consisted of a stratified 
random sample of 32 task exercises from each of 
the following categories: (a) tasks found in Cur- 
ricula D and H, (b) tasks found only in Curriculum 
D, (c) higher-order tasks found only in Curriculum 
H, and (d) tasks that were found in neither treat- 
ment but which theoretically could be derived 
from rules found in Curriculum H. The exercises 
were selected randomly from each of these cate- 
gories—10 each from Categories (a) and (b), and 
6 each from Categories (c) and (d). (Kuder- 
Richardson Formula 20 was used to obtain reli- 
ability coefficients of .86 for both the pretest and 
the posttest. High curricular (content) validity 
(Cureton, 1951) was assured by the method of 
construction used.) 

Subjects and design. The subjects were 48 (16 
male, 32 female) Trenton State College summer 
school students enrolled in two sections of a course 
on teaching modern mathematics in the elementary 
grades. The first author was the instructor for both 
sections of the course. 

Pretest scores were used to assign 24 subjects 
to a high group and 24 subjects to a low group. 
Within each group, 12 subjects were randomly 
assigned to each curriculum, 

Procedure. During the first class meeting of 
each section, the subjects were told that the course 
was experimental and that, with the exception of 
unrelated outside reading, all of their work would 
be done individually and during class time. Then, 
the pretest was administered. The subjects had all 
the time they needed to complete the test. The 
highest score on the pretest was 24 exercises correct 
out of the possible 40. The median was 10 and 
the mean was 11.2. Forty-five of the 48 subjects had 
pretest scores below 20, and 26 had scores of 10 
or below. 

Each subject purchased a notebook at the be- 
ginning of the six-week term; all experimental work 
was done in these notebooks. The instructional 
workbooks and student notebooks were distributed 
at the beginning of each two-hour, five-minute class 


TABLE 1 
NUMBER OF TASKS IN CHAPTERS 5 
THROUGH 9 
Chapter 
Curriculum. 
5 6 1 8 9 


Discrete rules 47 22 79 72 83 
Higher order 32 17 46 44 35 
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period and collected at the end. The Curriculum D 
and Curriculum H subjects met in separate class- 
rooms for three class periods per week. Each sub- 
ject was permitted to go on to the next task as 
soon as he had demonstrated success on a number 
of consecutive exercises equal to the number of 
illustrative examples constructed for that task. A 
daily record of each subject's performance was 
kept, as were anecdotal records indicating par- 
ticular difficulties or problems that arose. 

During the first meeting of each week (begin- 
ning with Week 2), subjects were given a set of 
individualized review ‘exercises; the exercises se- 
lected were those exercises completed during the 
preceding two weeks which had given the sub- 
ject the greatest difficulty. The subjects were re- 
quired to work these exercises without the use of 
the workbook or their notebooks. On the final class 
meeting of each week, the review exercises and solu- 
tions were returned to the subjeets. They were told 
to check their solutions using their workbooks and 
notebooks and to correct any errors. As soon as 
this had been completed, the review materials 
were collected. 

During the final three weeks, class time was set 
aside to enable subjects to review and study the 
rules they had learned up to that time (45 minutes 
during Weeks 4 and 5; 3 hours during Week 6). 
The amount of time spent working on the tasks and 
on review was the same for all subjects. In those 
few cases where a subject missed a class, he was 
required to make up the time missed. 

At the end of the term, 14 of the 24 subjects in 
Curriculum D were working in Chapter 7, with 
the farthest advanced to Task 36. Two of the 24 
Curriculum H subjects completed Chapter 7, 
and 16 others were working in Chapter 7. All 
Curriculum H subjects completed the five higher- 
order tasks, 

The posttest was given during the next-to-last 
class meeting. The subjects were given all of the 
time they needed and they were encouraged to do 
their best. 


Results and Discussion 


Mastery. The Curriculum D subjects 
were successful on 339 of the 362 posttest 
tasks on which they as individuals had been 
trained, for a mastery level of 94%. (The 
number of test items on which each subject 
had been trained was determined by ex- 
amination of his class workbook.) The Cur- 
riculum H subjects were successful on 183 of 
the 190 lower-order tasks (96%) on which 
they had been trained. It would appear that 
making rules explicit does provide a viable 
basis for instruction. 

On the higher tasks, Curriculum H sub- 
jects performed as expected at a signifi- 
cantly higher level (F = 7.37, df = 1,44, 
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p < 01) than the Curriculum D subjects. * 
The overall means were 3.2 (Curriculum D) 
and 4.3 (Curriculum H) with a maximum 
score of 6. The data of the individual sub- 
jects showed that the Curriculum H subjects 
were successful on 103 out of the 144 post- 
test exercises (71.5%), whereas the Cur- 
riculum D subjects were successful on 77 
out of 144 (53.4%) exercises. These propor- 
tions differed significantly (arc sine trans- 
formation, Z = 2.9, p < .01; cf. Baggaley, 
1957). 

It is of interest, nonetheless, that the Cur- 
riculum D subjects, although not trained on 
the higher-order tasks, did perform success- 
fully 53.4% of the time. This suggests that 
some of the Curriculum D subjects may 
have known these (relatively simple) * 
higher-order rules prior to the experiment, 
while others may have been able to induce 
them as they worked through Curriculum 
D. In any case, the performance level did 
not approach that of the Curriculum H sub- 
jects, and it can safely be concluded that 
training on higher-order rules had a posi- 
tive effect. This gap undoubtedly would 
have been even greater had more sophisti- 
cated higher-order rules been introduced. 

Transfer. The hypotheses pertaining to 
transfer were equally conclusive. As ex- 
pected, Curriculum H subjects performed 
just about as well on tasks found only in 
Curriculum D as did the Curriculum D sub- 
jects who were trained on these tasks 


directly. The Curriculum D subjects were „a 


successful on 168 out of the 181 posttest 
tasks (93%) in this category, and the Cur- 
rieulum H subjects were successful on 107 
out of the corresponding 190 tasks (88%). 
The difference between these two propor- 
tions was not significant (are sine transfor- 
mation, 0 < Z < .5, p > .05). The overall 
means were 83 (Curriculum D) and 80 
(Curriculum H) with a maximum score of 
10 (F = 1.02, df = 1,44, p > .05). 

In addition, the Curriculum H subjects, 
as predicted, performed at a significantly 
higher level (F = 30.03, df = 1,44, p < 
001) than the Curriculum D subjects on 
tasks beyond the scope of either curriculum. 
On the six tasks where solution rules could 
be derived from given rules via the higher- 
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order rules, the obtained means were 3.1 
(Curriculum D) and 4.1 (Curriculum H). 
The Curriculum H subjects were successful 
on 98 of the 144 transfer opportunities pro- 
vided (68%), and the Curriculum D sub- 
jects were successful on 74 of the 144 
(51.3%). These proportions differed signifi- 
cantly (are sine transformation, Z = 2.9, 
p<.0l). 

Relation between transfer and mastery on 
higher- and lower-order rules. It is also 
worth noting that the difference in group 
performance of about 17%-18% on the 
higher-order tasks was directly reflected in 
performance on the six transfer tasks that 
were neither in Curriculum D nor Curricu- 
lum H. There too the difference was about 
17%-18%. This observation suggests that 
subjects who had (directly or indirectly) 
learned a higher-order rule were able to 
apply it successfully to transfer tasks. In 
effect, it would appear that the availability 
of a suitable higher-order rule, together 
with appropriate lower-order rules, provides 
a sufficient basis for transfer to new tasks. 

A more detailed analysis showed that (a) 
of the 235 cases where subjects were success- 
ful on a higher order task, they were suc- 
cessful on 166 of the corresponding transfer 
tasks, giving 71% correct prediction; and 
(b) of the 101 cases where subjects were un- 
successful on a higher-order task, they were 
unsuccessful on 72 of the corresponding 
transfer tasks, again giving 71% correct 
prediction. These findings are generally 
compatible with a number of related “labo- 
ratory” experiments (Scandura, 1973b). 
Although predictability did not approach 
the levels obtained there (86%-100% accu- 
racy), these results demonstrate the robust- 
ness of the theory. 

Other data showed that transfer was not 
affected by the direct presentation on the 
transfer tasks of the necessary lower-order 
rules. (The rules to which the higher-order 
rules applied were presented directly on 
four of the six transfer tasks.) The Cur- 
riculum D subjects were successful on 48 
out of 96 transfer problems (50%) in which 
the needed lower-order rule was presented 
and 28 out of 48 transfer problems (54%) 


.in which the subjects were trained on the 
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needed lower-order rule but were not for- 
mally presented with it on the test. Cor- 
respondingly, the Curriculum H subjects 
were successful on 66 out of 96 transfer 
problems (69%) and 32 out of 48 transfer 
problems (67%). Neither pair of proportions 
differed significantly (Z = .59, p > .05 for 
the Curriculum D differences and Z = .29, 
p > .05 for the Curriculum H differences). 
This suggests that memory was not an 
essential factor, at least under the present 
conditions where the level of mastery on 
the lower-order rules was about 95%. 


SuMMARY AND IMPLICATIONS 


In summary, these results clearly show 
that rules provide a viable and explicit basis 
for instruction and transfer. The Curriculum 
H subjects not only had fewer rules to 
learn than did the Curriculum D subjects, 
but they were also able to solve tasks that 
the other subjects could not. The strong 
transfer effects were obtained even though 
the six-week course came to an end just as 
many of the Curriculum H subjects were 
reaching that portion of the workbook 
where the higher-order formulation had 
made it possible to eliminate large numbers 
of Curriculum D tasks (cf. Table 1). If time 
had permitted some of the subjects to com- 
plete Chapters 8 and 9, it seems quite pos- 
sible that an even greater difference might 
have been obtained in favor of the Cur- 
riculum H subjects in amount of material 
covered per unit of time. 

These results suggest that the algorithmic 
approach should be given serious considera- 
tion in planning future curriculum develop- 
ment. Curricula that are characterized in 
terms of rules and higher-order rules pro- 
vide an explicit basis for instruction and, 
even more important, make specific provi- 
sion for remote transfer, something which 
many subject-matter specialists feel is lack- 
ing in current curricula based on operational 
objectives. 

The present study, of course, demon- 
strates the viability of the algorithmic ap- 
proach only with respect to certain parts of 
mathematics. Further research is needed to 
determine the feasibility of the approach 
with other subject matters. A study on criti- 
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eal reading by Lowerre and Scandura 
(1973) is particularly relevant in that it 
was based on a short-cut approximation of 
the algorithmic approach to curriculum 
development called dimensional analysis. 

It also should be emphasized that while 
knowledge can be rigorously characterized 
in terms of rules, this does not imply that 
knowledge must be imparted to students 
(e.g., young children) in this manner. As in 
this study, rules can be acquired by telling 
or by discovery from instances or, in other 
cases, by symbol juggling or conerete manip- 
ulation. The choice is up to the teacher and 
depends on factors other than the particular 
knowledge in question. Instructional for- 
mats other than workbooks, of course, 
should also be explored. The important 
point is that if we know precisely what 
(rule) it is that we want a child to learn, 
then we can facilitate learning far better 
than if we do not. 
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INTERNALIZATION OF FILMIC SCHEMATIC OPERATIONS IN 
INTERACTION WITH LEARNERS' APTITUDES' 


GAVRIEL SALOMON* 
The Hebrew University of Jerusalem, Israel 


Two major hypotheses were tested in three experiments. The first hy- 
pothesis was that youngsters can imitate and internalize filmic codes to 
be used subsequently as covert schematized mediators, The second hy- 
pothesis was that learners with low, relevant aptitude scores profit 
more from films which model for them schematic operations to be 
internalized than do high-aptitude learners. Two kinds of operations 
were either modeled, short-circuited, or not shown at all, thus requir- 
ing subjects to activate them on their own. These operations were zoom- 
ing in on details (Experiments 1 and 2) and laying out of solid objects 
(Experiment 3). Subjects were 80 eighth graders in Experiment 1, 56 
eighth graders in Experiment 2, and 42 ninth graders in Experiment 3. 
Results of two of the experiments supported the first major hypothesis, 
thus showing that internalization of schematie filmie codes is possible 
and leads to improved performance on related transfer tasks and 
ability tests. Aptitude-treatment interactions emerged in all three ex- 
periments, as expected by the second major hypothesis. It was con- 
cluded that filmic modeling of schematic operations can lead to their 
internalization, thereby improving the ability of low scorers to use the 
operations as covert mental skills, Verbal ability was not found to be a 
necessary mediator in this kind of learning. 


Bruner postulated on a number of ocea- These ideas are far from being new in the 


sions (Bruner, 1961; Bruner et al., 1966) 
that for communication systems, tools, or 
media to be effective, they must produce ap- 
propriate internal counterparts in their 
users’ minds. Berlyne (1965) stated simi- 
larly that signs appear to have a dual func- 
tion: they serve both for overt communi- 
cational and for covert representational 
purposes. More recently, Olson (1972) sug- 
gested that while different forms and media 
of instruction convey knowledge that maps 
onto a common knowledge system, they di- 
verge as to the mental skills they cultivate. 


1The research reported here was partly sup- 
ported by a grant from the American Psychological 
Foundation and partly by the Israel Institute of 
Applied Social Research. The author is grateful to 
Michal Siman-Tov, Deborah Malveh, and Avra- 
ham Cohen for their assistance in carrying out the 
experiments and to Susan Harter at Yale Univer- 
sity for her helpful suggestions. 

? Requests for reprints should be sent to Gav- 
riel Salomon, School of Education, The Hebrew 


w^ University, Jerusalem, Israel. 


realm of language. In fact, much of the 
verbal training given to lower-class children 
is based upon the assumption that improve- 
ments in one’s communicational capacity 
are internalized and lead, therefore, to better 
intellectual functioning (e.g., Blank & Solo- 
mon, 1968). 

The hypothesis that communication codes 
can be internalized to be used as “mental 
tools” need not be limited to language. 
Media of communication, it is often 
claimed, utilize different symbolic codes of 
a nonverbal nature (e.g., Eisner, 1970; Gom- 
brich, 1972). It is difficult to prove that 
these symbolic codes constitute different 
grammars in the regular sense. However, it 
can be shown that they do consist of specific 
and quite diverse symbols and modes of 
combining them (e.g., Gordon, 1969). More- 
over, some media, particularly film and TV, 
contain in their symbolic systems specific 
codes to represent relatively unique trans- 
formations in space and time (e.g., slow mo- 
tion, the zoom of a camera, rotations, etc.) . 
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Given the recent accumulation of findings 
pertaining to the role of nonverbal mediation 

(imagery) in mental development (Piaget 
& Inhelder, 1971), memory, and problem 
solving (Arnheim, 1970), one may wonder 
whether these can be internalized from the 
communication media to which one is ex- 
posed. 

This would be one way to examine Bru- 
ner’s (1961) postulation about the develop- 
ment of internal counterparts to media and 
to test Olson’s (1972) hypothesis about the 
mentally cultivating function of media and 
modes of instruction. 

The hypothesis we wished to test is that 
one learns to use covertly in his representa- 
tional system certain operational schemes 
which he encounters as part of a medium’s 
language of communication. Thus, for exam- 
ple, we would expect someone to better 
visualize a “slowed” operation after inten- 
sive exposure to films which show “slowed 
down” movements. Once such a scheme is 
internalized, it should serve as a mediating 
mechanism which facilitates performance 
in relevant problem situations (Lesser, 
1972). The “internal counterpart” to which 
Bruner and Berlyne refer would then be- 
come exactly that: What is originally a part 
of a medium’s language is now transformed 
into covert visualization for future use. 

What is the nature of this process? The 
analogy with language has its limits. Al- 
though some imitation may take place in 
early childhood, the acquisition of language 
is characterized by interaction with others 
(e.g, Brown & Bellugi, 1964) and by the 
fact that the child both encodes and decodes 
verbal messages. This does not take place in 
other media where the child serves mainly 
as a decoder. 

Nevertheless, there are several indications 
that even under such conditions, internaliza- 
tion of communication codes is possible. 
Most clearly, there is the possibility of imi- 
tation. Although Bandura (e.g., 1965) lim- 
ited himself to studying the imitation of 

human models (although some were put on 
film), there is no reason why a learner could 
not imitate nonhuman objects and their “be- 
haviors.” Piaget (1962) is quite explicit 
about this possibility and provides empirical 
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observations to support it. The imitation 
and internalization of operations are, accord- 
ing to him, a major element in the genera- 
tion of schematic images and are part of 
one’s developing intelligence. Thus, the fact 
that one does not use the scheme as an 
encoder or interact with others by means of 
such a code may not prevent its internaliza- 
tion. 

However, modeling overtly a particular 
transformation, which has the potential of 
becoming an internalized schematic media- 
tor, may have differential effects depending 
on the learners’ initial mastery of function- 
ally equivalent mediators. Instances in 
which learners were given ready-made medi- 
ators resulted often in interference (Bruner, 
1961 ; Gentile, Kessler, & Gentile, 1969; Jen- 
sen, 1967). However, interference implies 
that the needed mediators are already avail- 
able in the learner’s repertoire. Improve: 
ment, rather than interference, can be exe 
pected when the learner does not initiall: 
master the explicitly modeled skill, that is, 
when such a presentation is sufficiently 
novel and carries some promise for better 
performance. On the other hand, already 
skillful learners would be expected to im- 
prove from a presentation which is less ex- 
plicit and which calls upon a mediator they 
have already mastered to some degree. It 
can thus be hypothesized that initial ability 
interacts with the explicitness with which a 
particular transformation is presented. 

Referring to the issue of communication 
codes which represent schematic transfor- 
mations in space or time, we reason as fol- 
lows. If a medium such as film does present 
explicitly a particular transformation which 
could become a covert representational 
scheme as a consequence of imitation and 
internalization, then initially less skillful 
learners would be expected to demonstrate 
better mastery of that representational 
transformation. More skillful learners woul 
be expected to experience interference an 
to demonstrate depressed performance. Th 
latter could, however, benefit more wh 
given the opportunity to provide that rep 
sentational transformation on their own. 

In the three experiments reported below, 
the above hypotheses were tested. Explicit- 
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ness of presenting the transformation to be 
internalized was operationally defined as 
three points along a continuum: 

1. Maximal explicitness, Films are shown 
that present a transformation, such as sin- 
gling out an item through gradually “zoom- 
ing in” on it (Experiments 1 and 2) or “lay- 
ing out” a solid object (Experiment 3), in 
fullest detail. That is, there is an initial 
state, a transformation it undergoes, and its 
resultant state (S;-tr-S5). In this condition, 
the symbolic code of the medium is used to 
its fullest; learners are expected to imitate 
and to internalize the presented transforma- 
tion. We labeled it the modeling condition. 

2. Partial explicitness. Slides, instead of 
film, are shown which have only the initial 
and final states, leaving out the transforma- 
tion leading from the former to the latter 
(S,-85). In this condition, learners are ex- 


pected to covertly supply the linking trans-- 


formation. Consequently, this was labeled 
the short-circuiting condition. 

3. Minimal explicitness. A slide is shown 
which depicts only the initial state (S1) and 
which leaves out the transformation and 
the resultant state (85). Given appropriate 
instructions, learners are expected to co- 
vertly activate and apply the appropriate 
transformation in order to come up with the 
final situation. This was called the activa- 
tion condition. 

The two filmic transformations selected 
for the experiments—singling out items 
through “zooming in” on them and “laying 
out” of objects—were chosen because they 
met two necessary criteria: (a) they were 
part of the medium's range of schematic 
codes and (b) there was reason to believe 
that these transformations can be internal- 
ized and thus enrich one’s covert media- 
tional system. 


EXPERIMENT 1 


The filmic code of “zooming in" on details 
that are embedded in a wider context was 
selected as the transformation to be modeled 
by film. This coded transformation was se- 
lected because we had good reason to believe 
that “zooming in" on details is analogous to 
the analytic process of singling out discrete 
components of a rich stimulus. Thus, there 
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was reason to hypothesize that “zooming in” 
has the potential of serving as a model, 
which could be imitated and internalized. 
It should, as a consequence, lead to an im- 
proved ability of cue attendance, as studied 
by Sieber and Lanzetta (1966) and Salomon 
and Sieber-Suppes (1972). 

Given the hypothesis presented earlier, 
the experiment called for four experimental 
conditions: modeling (maximal explicit- 
ness), short-circuiting (partial explicitness), 
activation (minimal explicitness), and a no- 
treatment control group. Since, however, ap- 
titude-treatment interactions were expected, 
rather than main effects, subjects’ initial 
cue-attendance ability had to be pretested. 


Method 


Stimuli. The modeling stimulus condition con- 
sisted of three, super 8-millimeter films. Each of 
these depicted one of Breughel’s paintings (“Chil- 
dren in the Playground,” “Proverbs,” and “Winter 
in the Village”). The camera “zoomed in” on de- 
tails in a random sequence (e.g., a particular child 
playing, a woman in a window, etc.), and “zoomed 
out” again. This was repeated 80 times in each 
film. The short-circuiting stimulus condition con- 
sisted of three series of slides, corresponding to the 
three films in the modeling condition. In each se- 
ries, a slide depicting one of the three paintings 
was shown first, followed by another slide depicting 
a detail, then the whole painting again, followed by 
another detail. This was repeated for 80 times in 
each series. The details that were singled out were 
identical to the ones shown in the modeling films, 
with the same random order of presentation and 
length of exposure. The activation condition con- 
sisted of only three slides, each showing one of the 
three paintings. 

Procedure. Subjects in all three treatment condi- 
tions were required to report in writing 80 details 
they had noticed in each film or slide. This was re- 
peated three times (once for each painting) so that 
each subject reached the criterion of reporting 240 
details altogether. 

Subjects in each treatment condition, with the 
exclusion of the control group subjects, were seated 
together in one room, given an introduction and ex- 
amples, and then worked individually ; that is, each 
subject reported in writing, the 80 details he noticed 
upon viewing a film or slide. Once finished, an ex- 
perimenter read over the subject’s list of noticed 
details. In case a detail was mentioned more than 
once, the subject was requested to replace it. Time 
to criterion varied: subjects in the modeling and 
the short-circuiting groups worked according to the 
speed determined by the presentations; the sub- 
jects in the activation group worked as long as 
needed, and the slide was projected until the last 
subject finished his task. 
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Subjects. 80 eighth graders, all from one school, 
participated in the study. They were randomly as- 
signed to four groups (n = 20): modeling, short- 
circuiting, activation, and a pretest- and posttest- 
only control group. There was an equal number of 
males and females in all groups. 

Measures. A pre- and posttest measure of cue 
attendance was group administered to all sub- 
jects. The purpose of the cue-attendance pretest 
Was to serve as an aptitude measure upon which 
posttest scores could be regressed, given the hy- 
pothesis that treatments and aptitude interact with 
each other. 

The cue-attendance® pre- and posttest were sim- 
ilar, though not identical in nature. Subjects, 
"seated in a group, were shown a slide depicting a 
rather complex visual montage of items and were 
asked to report in writing, the maximum number 


TABLE 1 


Means AND STANDARD DEVIATIONS OF PRE- AND 
Posrrests FOR ALL GROUPS 


Cue attendance 
Group Pretest Posttest 
x SD x SD 
Modeling 13.12 | 6.90 | 31.3° | 8.60 
Short-circuiting | 11.45 | 6.07 | 25.15 | 7.40 
Activation 13.05 | 4.17 | 32.8: | 9.95 
Control 12.70 | 6.65 | 16.7* | 6.50 


Note. Means which are significantly different 
from each other (p < .05) have different super- 
scripts. 


of items they could notice. There was a time limit 
of seven minutes. 


Results 


As shown in Table 1, all four groups had 
quite similar scores on the pretest. However, 
significant group differences were found (F 
= 4.52, df = 3/76, p < .01, one-way analy- 
sis of variance) on the posttest. All three 
experimental groups reported noticing sig- 
nificantly more items on the cue-attendance 
posttest than the control group, indicating 
an accord with previous studies (Salomon 
& Sieber-Suppes, 1970; Sieber & Lanzetta, 
1966) that this skill is modifiable. 

In addition, the posttest performance of 
the modeling group was not significantly 


*More information pertaining to this measure 
can be found in Salomon and Sieber-Suppes (1972). 
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TABLE 2 
Linear REGRESSION COEFFICIENTS OF POSTTEST 
SCORES PREDICTED FROM PRETEST SCORES 
ror ALL GROUPS 


Group r b coefficient ” 
Modeling —.46 —.572 2.21* 
Short-circuiting 4.28 B41 1.22 
Activation +.52 1.24 2.78** 
Control +.48 — .469 2.50* 


* All ( values are the difference from zero. 
*p< 05. 
** p € Ol. 


better than that of the activation group, but 
both were better than the short-circuiting 
group. This seems to suggest that on the 
average, much explicitness or very little ex- 
plicitness is more effective than partial ex- 
plicitness. 

More directly related to our hypothesis 
concerning an aptitude-treatment interac- 
tion is the examination of regression lines 
within each treatment condition. Thus, cues 
attendance posttest scores were regressed on 
pretest (aptitude scores), as presented in 
Table 2. 

The analyses revealed a significant inter- 
action between the initial cue-attendance 
aptitude and the conditions of modeling and 
activation (t = 3.15, df = 36, p < .001, se 
Figure 1). 

In agreement with the hypothesis, low 
scorers on the aptitude test appeared to have 
benefited relatively little from the activation 


—— MODELING 
—.—. SHORT CIRCUITING 
—— ACTIVATION 


CUE ATTENDANCE POSTTEST 


15 20 


CUE ATTENDANCE PRETEST 


Ficure 1. Cue-attendance posttest scores 
gressed on pretest scores for each group. 
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condition. This condition, as hypothesized, 
called upon mediators that low-aptitude 
scorers were not expected to master initially. 
However, they have profited far more from 
the modeling condition which, as we have 
reasoned, provided a medium-generated 
code that could be imitated and internalized. 
While low-aptitude scorers appeared to 
learn from the highly explicit filmic presen- 
tation of the transformation, initially high 
scorers performed less well following model- 
ing and far better following the activation 
condition. It appears that the very explicit 
modeling films offered a mediating transfor- 
mation which interfered with the already 
developed mediational capability of those 
subjects. They did profit, however, from a 
condition which called upon mediators that 
they had mastered already. 

The short-circuiting condition appeared 
to benefit low- and high-aptitude scorers 
alike. 

Discussion 

The results of this experiment lend tenta- 
tive support to the two major hypotheses: 

1. A filmic transformational scheme, in 
this case “zooming in” and “zooming out,” 
is learnable; it can be used covertly in a 
similar task with new material (the cue-at- 
tendance posttest). 

2. There is a negative relationship between 
the explicitness of presenting the transfor- 
mational scheme (i.e., the visual presenta- 
tion of the operation) and the learner’s ini- 
tial ability to execute it covertly on his own. 
The less able he is, the more he profits from 
such modeling. 

Although results were as expected, there 
is nothing in the data to suggest how the 
learning took place when one “internalized” 
the filmic scheme. If it was imitation, then it 
should not have been limited to only one 
scheme, given that our films also used other 
operational schemes at the same time. For 
instance, the order in which details were 
singled out was such an additional scheme, 
and subjects would be expected to imitate it 
‘as well. Second, if it was not just simple vis- 
/ual imitation, and the learning was mediated 

| by internal verbalization, then induced ver- 
| balization should lead to even more im- 
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provements. Experiment 2 was designed to 
partially replicate experiment 1 and to shed 
light on the above questions. 


EXPERIMENT 2 


This experiment partially replicated the 
former one. However, one experimental vari- 
able was added, namely, induced overt label- 
ing of the items that were singled out by the 
camera (in the modeling condition) or by 
the subject himself (in the activation con- 
dition). It was reasoned, following Kendler 
and Kendler (1968) that if verbalization 
plays a mediating role in learning from the 
modeling films, then induced labeling should 
improve posttraining performance. Low- 
verbal-ability subjects were expected to gain 
more from such labeling than more verbally 
able subjects who apparently were more 
likely to use verbal mediation on their own. 
However, if this kind of learning does not 
rely upon verbal mediation, then, it was 
reasoned, induced verbalization would have 
no effect on learning. 

In the present experiment, the short-cir- 
cuiting condition was dropped, leaving only 
the modeling and activation conditions. 
These are the two extreme ends of the con- 
tinuum of modeling a covert skill through 
the use of filmic operational schemes. This 
resulted in a 2 X 2 (Modeling versus Activa- 
tion X Verbalization versus No Verbaliza- 
tion) factorial design. 


Method 


Stimuli. There were two stimulus conditions 
which were identieal to the modeling and activa- 
tion conditions of the previous experiments. Half 
of the subjects saw the films which modeled the 
operation of singling out details from a rich dis- 
play, using the “zooming in” technique (the mod- 
eling condition). The other half saw three slides of 
these paintings on which the films were based (the 
activation condition). 

Procedure. There were four treatment groups. 

1. Modeling-verbalization group. Subjects saw 
the three modeling films. While viewing them, one 
subject was randomly called upon on each trial to 
label aloud the detail on which the camera was 
then "zooming in." Subjects saw the films in groups 
of seven (half the size of the group). It was as- 
sumed that all subjects were busy labeling the de- 
tails to themselves, since no one could know whose 
turn it would be next to label aloud. The criterion 
to be reached by each subgroup of seven subjects 
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was 42 labeled details per film (6 reported details 
per subject). Each child was reinforced for his la- 
beling. The whole training period lasted for about 
three hours. T 

2. Modeling-nonverbalization group. Subjects 
were run in the same fashion as above, with the ex- 
clusion of verbalization. No verbal responses were 
required, and no reinforcements were given. Sub- 
jects were told to “notice what exactly the camera 
does and to note to themselves the details they 
observe." The films and the criterion were the 
same (criterion was actually set by the number of 
“zoom ins" per film). 

3. Activation-verbalization group. The same pro- 
cedure as in the first group was maintained. How- 
ever, instead of being exposed to the films which 
singled out details for the subjects, there was a 
static slide and subjects had to report noticing the 
same number of details from the slide, as in the 
first group. 

4. Activation-nonverbalization group. This 
group served actually as a control group since it 
received the same instructions as the other groups, 
but the subjects watched the static slides in silence. 
Instructions were those given to the second group. 

Subjects. 56 eighth graders were randomly cho- 
sen from two classrooms, These were then ran- 
domly assigned to the four groups (n = 14). Within 
each group, the 14 subjects were again randomly 
divided into two subgroups of 7 each. This was the 
unit with which the experimenters worked. 

Measures. The number of aptitude and post- 
training measures was increased from Experiment 
1. The reasons for increasing the number of pre- 
test aptitude measures were, first, that cue attend- 
ance, as measured by us, was a rather limited apti- 
tude with unknown relations to other aptitudes. 
Hence, the Embedded Figures Test (measuring 
Field Dependency, Witkin, 1964) was added. Sec- 
ond, since we hypothesized verbal ability to inter- 
act with treatments, an appropriate test had to 
be added. Posttest measures were increased to en- 
able us the examination of what else might be in- 
ternalized from the films, 

There were three pretest-aptitude measures ad- 
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ministered in a random order one day prior to 
experimentation: (a) cue-attendance pretest, very 
similar in nature and requirement to the tests used 
in Experiment 1; (b) Witkin's Embedded Figures 
Test (to measure Field Dependency); and (c) an 
Israeli Standardized Verbal Ability Test (MILTA), 
for which norms based on national samples are 
available. 

The posttest measures, administered in a ran- 
dom order one day following the end of experi- 
mentation, were as follows: (a) A cue-attendance 
posttest, which was based like the previous cue- 
attendance tests, on a complex and rich visual dis- 
play, from which subjects had to single out details 
and report them in writing. (b) An organization 
measure, extracted from the cue-attendance test, 
which indicated the extent to which details were 
noticed in some spatial order, This measure was 
included since we hypothesized that subjects might 
imitate not only the act of “zooming in” but the 
order (in fact, random order) in which items were 
“zoomed in” on. Whenever a subject reported no- 
ticing a detail which was spatially adjacent to a 
formerly reported detail, he received one point. 
Subjects were not told to report details in any or- 
der, nor was the question of order mentioned. 


Results 


Means and standard deviations of the 
four groups on the three pretests and the two 
posttest measures are presented in Table 3. 

It should be noted that cue-attendance 
pretest scores are higher than those in Ex- 
periment 1 due to richer and more readily 
available details in the stimuli in Experi- 
ment 2. Posttest mean scores are lower in 
the postest than in the pretest because of 
more difficult stimuli. 

Factorial 2 x 2 analyses of variance 
failed to produce statistically significant F 
ratios for either of the two main effects or 


TABLE 3 
MEANS AND STANDARD Deviations Fon Eacu Group 


Cue attendance 


MILTA Hed. Organization 
Group Pretest Posttest 
x SD x SD x SD x sD x SD 
Modeling 
With verbalization 92.8 13.9 6.5 4.7 50.9 | 10.9 | 35.9 | 2.7 36.2 15.4 
No verbalization 92.7 12.8 6.5 4.5 | 48.6) 9.6 | 34.0] 6.2 | 37.5 18.7 
Activation 
With verbalization | 87.4 10.7 6.8 4.0 | 48.8] 9.5 | 33.7] 6.3 | 35.3 17.8 
No verbalization 89.1 11.9 6.5 3.2 | 50.9] 10.8 | 32.6 | 6.5 33.7 10.2 


Note. Abbreviation: MILTA is an Israeli Standardized Verbal Ability Test. 
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for the interactions between them. The fail- 
ure to detect significant differences between 
the treatments and the control group disa- 
grees with the results of Experiment 1, al- 
though the ages of the subjects in the two 
experiments were very similar, and the stim- 
uli were identical. The only major difference 
between the two experiments was in the 
level of training criterion that each subject. 
had to reach individually. In Experiment 1, 
each subject had to report in writing 80 de- 
tails that were noticed per film (or slide) 
and 240 details altogether. In Experiment 2, 
the individual eriterion was only 6 verbally 
reported details per film (or slide), and 18 
altogether, per subject. Thus, it would be 
reasonable to postulate that in Experiment 2 
the criterion to be reached was apparently 
far too low to produce any observable dif- 
ference which reaches the needed level of 
significance. 

Our main concern was, however, with ap- 
titude-treatment interactions (ATI). To de- 
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tect such interactions, posttest scores were 
regressed on aptitude scores for each group. 
The intercorrelations among the measures 
are given in Table 4. 

What concerns the differential relation- 
ship between pretest and posttest cue-at- 
endance is that we find an interaction be- 
tween the modeling-no-verbalization and 
the activation-no-verbalization (control) 
conditions (difference between regression 
slopes: ¢ = 3.30, df = 26, p < .01). Such an 
interaction does not take place, however, 
between modeling and activation when overt 
verbalization is required. 

In other words, whenever subjects are re- 
quired to activate covertly the necessary 
process of singling out details or when they 
are to overtly label the singled-out details, 
initially good cue attenders benefit most. 
When, on the other hand, no labeling is re- 
quired and the process of singling out de- 
tails is explicitly modeled, poor cue attend- 
ers benefit most (Figure 2). 


TABLE 4 
INTERCORRELATIONS BETWEEN MEASURES SEPARATELY FOR EAcH Group 


Cue attendance 
Measure/group MILTA Field Dependency Organization 
Pretest Posttest 
MILTA 
Modeling - verbalization = .875** .560* 271 -605* 
Modeling - no verbalization $ .528* .552* EN eum .800 
Activation — verbalization = -531* .523* -T80* .153 
Activation — no verbalization — .653* .630* 580 — .063 
Field Dependency 
Modeling -= verbalization e .271 .934 -686* 
Modeling - no verbalization E .355 — .088 .516 
Activation — verbalization EN .266 .984 .073 
Activation — no verbalization E 377 377 —.278 
Cue-attendance pretest 
Modeling - verbalization = -403 -356 
Modeling - no verbalization — .007 .372 
Activation — verbalization = E .MB5 
Activation — no verbalization = -549 .311 
Cue-attendance posttest 
Modeling - verbalization = — .108 
Modeling - no verbalization PER — .082 
Activation — verbalization Tn .453 
Activation — no verbalization KW .566* 


Note. n — 14 in each group. Abbreviation: MILTA is an Israeli Standardized Verbal Ability Test. 


*p« 05. 
bis best) 
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Figure 2. Cue-attendance posttest scores re- 
gressed on pretest scores for each group. 


Another interaction is observed when cue- 
attendance posttest scores are regressed on 
MILTA pretest scores. Here, the interaction 
is also between the modeling-no-verbaliza- 
tion and the activation-no-verbalization 
conditions. While initially, less verbally able 
subjects benefit most from the modeling 
condition, they benefit far less from the ac- 
tivation condition. The converse is true for 
the more verbally able subjects (difference 
between regression slopes: t = 2.65, df = 26, 
p < .05; Figure 3). It should be noted, how- 
ever, that the correlation between MILTA 
and cue-attendance pretest scores is .474 (N 
= 56, p < .01); hence, the similarity of the 
two aptitude-treatment interactions. 

It appears that when no labeling is re- 
quired, initially high-aptitude scorers bene- 
fit more when the transformation, which 
they have mastered already, is called upon 
(activated) rather than modeled. For the 
initially low-aptitude scorers, the converse 
is true. We thus find additional support for 
our hypothesis concerning the differential 
effectiveness of explicitly modeling a trans- 
formation via a filmic code. 

Verbalization, or labeling, usually facili- 
tates the performance of those who are al- 
ready verbally more able, while it does not 
facilitate the performance of subjects who 
are less verbally able. This seems to suggest 
that the latter, who tend also to be poor 
cue attenders, internalize the transformation 
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and use it covertly as a nonverbal mediator. 
Internal verbalization seems therefore to be 
an unnecessary mediator for such perform- 
ance to take place. 

Other interactions emerge when organiza- 
tion scores are regressed on MILTA or 
Field-Dependency scores (overall correla- 
tion between the two is .692), as shown in 
Figures 4 and 5 (F = 5.16 and F = 5.62, 
respectively, p < .05). The interesting point 
to note is that low MILTA scorers and high 
Field Dependency ones scanned the visual 
field in a much less organized way, following 
the modeling treatments, than similarly low 
scorers following activation conditions. On 
the other hand, low scorers on the two ap- 
titude tests, who were not exposed to the 
model (which as it will be recalled, dis- 
played a random order of “gooming in" on 
details) showed better organization in their 
scanning, which did not differ significantly 
from that of those with high aptitude. It 
should also be noted that high-aptitude sub- 
jects scanned the field in a much more or- 
ganized fashion following the modeling con- 
ditions than those high-aptitude subjects 
in the activation conditions. 

Whether this shows some kind of over- 
compensation on the side of the latter sub- 
jects in face of the unorganized model re- 
mains an open question. It is the case, how- 
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Ficure 3. Cue-attendance posttest scores Te 
gressed on MILTA pretest scores for each group. 
(Abbreviation: MILTA is an Israeli Standar 
Verbal Ability Test.) 
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Ficure 4. Organization scores regressed on 
MILTA pretest scores for each group. (Abbrevia- 
tion: MILTA is an Israeli Standardized Verbal 
Ability Test.) 


ever, that high MILTA scorers, following 
the modeling conditions, noticed signifi- 
cantly fewer details (as measured by the 
posttest) than high MILTA scorers who 
were exposed to the activation conditions. 
Comparing the cue-attendance performance 
of the 10 highest scorers on MILTA within 
the modeling conditions with the 10 highest 
MILTA scorers in the activation conditions 
shows that the latter perform a bit better 
than the former (t = 1.8, df = 18, p < .10). 
The opposite occurs when we compare the 
organization scores of these two subgroups 
(t = 2.01, df = 18, p < .05). Thus, it ap- 
pears that when exposed to a model which 
(a) shows an operation with which these 
subjects are familiar and (b) displays 
another operation (random order of scan- 
ning) that apparently disagrees with their 
more orderly style, these subjects put more 
energy into imposing structure on their re- 
sponses than into producing a large number 
of such. Therefore, their organization scores 
correlate highly with verbal ability when in 
the modeling conditions (.70 with MILTA 
and .72 with Field Dependency) and not at 
all with their posttraining cue-attendance 
performance (—.08). The exact opposite 
pattern takes place in the activation condi- 
tions (see Table 4). 
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Discussion 

In general, then, apparently due to train- 
ing criteria that were too low, no clear effects 
of either modeling the schematic operation 
or of induced verbalization were found. The 
assumption that all subjects in the verbali- 
zation groups engage in spontaneous labeling 
while expecting to be called upon, appears to 
be questionable. It is more likely that the 
low-verbal-ability subjects engage in label- 
ing only after being called upon. Therefore, 
the criterion of 18 labels which they had to 
provide was far too low for them. On the 
other hand, the aptitude-treatment inter- 
actions show that learning—in terms of in- 
ternalization of the modeled operation— 
takes place. However, this is restricted to 
the low-aptitude scorers, while the high 
scorers either experience interference or try 
to rely on their high verbal ability to over- 
come the disagreement between the oper- 
ations on which they typically rely and the 
ones explicitly modeled for them. What the 
interactions with MILTA as predictor sug- 
gest is that some subjects (particularly low- 
verbal-ability scorers) imitate the visual 
schematic operation quite directly, some- 
thing that is seen in the number of details 
they report and in the reduced organization 
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Ficure 5. Organization scores regressed on Em- 
bedded Figures Test pretest. scores for each group. 
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of their responses. Other subjects (partic- 
ularly the more verbally able ones) do not 
just imitate the visual schematic operation 
but internalize it through verbal mediation. 
This in turn causes them to produce fewer 
responses but to chunk and organize them 
better. 

Experiment 3 was partly designed to shed 
more light on this question. It was hy- 
pothesized that if more verbally capable 
subjects do impose logic on the operation 
which they internalize, then their learning 
should not be debilitated. This, however, re- 
quires one necessary condition, namely, that 
the schematic operation to be internalized 
can be executed by means of logical opera- 
tions and not only by means of dynamic 
images, as is apparently the case with the 
“zoom in" operation. 


EXPERIMENT 3 


In this experiment, we returned to the 
design of Experiment 1, leaving out only the 
activation condition and choosing to model 
or short-circuit an operation which is far 
more novel to subjects than the one modeled 
before. The schematic operation modeled 
here was that of “laying out” solid objects. 
This again is an operation which is within 
the filmic range of schematized operations 
and which at the same time resembles a 
covert process in use when certain “visual- 
ization” problems (e.g., in learning geome- 
try) are encountered. Thurstone and Guil- 
ford designed measures of one’s mastery of 
this operation such as the Paper-Folding 
Test, Surface Development Test, and the 
Form Board Test (French, Ekstrom, & 
Price, 1963). This operation can, however, 
be also executed along logical lines, instead 
of as an act of vivid visualization. 


TABLE 5 
MEANS AND STANDARD DEVIATIONS For Each GROUP 
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The hypotheses of the present experiment 


were as follows: 

1. Students can and do internalize the 
visually modeled scheme of “laying out” 
solid objects shown on the screen, as can be 
observed in their improved scores on visual- 
ization tests, thus demonstrating that this 
kind of learning from media is not limited 
to the operations studied in the previous ex- 


periments. 

2. On the basis of the findings in Experi- 
ment 2, verbally able subjects (assumed to 
rely on internal verbalization rather than on 
visualization) would profit from visual 
modeling as much as less verbally able sub- 
jects; however, they would tend to execute 
the operation covertly by means of internal 
verbalization rather than internal visualiza- 


tion. 
Method 


Stimuli. There were two 
modeling and short-circuiting. The modeling stim- 
ulus consisted of a 15-minute film in which five 
solid objects appeared and then gradually were 
laid out, side after side, to produce a two-dimen- 
sional plan of the object. Once an object was laid 
out, it gradually folded up again. The short-circuit- 
ing stimuli consisted of a series of 10 slides (5 
pairs). The first slide in a pair showed the solid 
object while the subsequent. member of the pair 
displayed the same object in a laid out position. 

Procedures. There were three experimental 
groups: modeling, short-circuiting, and a control 
group. The modeling group was given a general in- 
troduction and then shown the film once. It saw 
the same film again on the next day and a third 
time on the third day. No responses were required. 
The short-circuiting group received the same treat- 
ment but with the slides instead of the film. The 
control group served as a pretest- and posttest-on 
group. No treatment was given. 

Subjects. The subjects were 42 ninth-grade stu- 
dents in a vocational school. All were males. They 
md randomly assigned to the three groups (n = 


treatment stimuli: 


Visualization (posttest) 


Visualization (pretest) Language studies Mathematics 
Group 
x | sD x sD x sD x 
Modeling 62.6 | 149 | 6.0 5| 
Short-cireuiting | 62.2 | 11.4 | 6-3 Maa Hd Fg 
Control 71.45 | 13.7 6.0* 79 5.6" 1.0 57.0" 


Note. Means which differ significantly from each other (p < .05) have different superscripts. 
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Measures. Three pretests were given to serve as 
aptitude measures: (a) visualization ability, mea- 
sured by the Paper Folding Test designed by 
Thurstone (French et al., 1963); (b) the subject’s 
average grade in language studies; and (c) the sub- 
ject’s average grade in mathematics. There was 
reason to believe that either one of them could 
serve as a valid predictor of learning the operation 
of “laying out.” There was only one posttraining 
measure, namely, a test of visualization ability, as 
measured by Thurstone’s Surface Development 
Test (French et al., 1963). 


Results and Discussion 


Although assignment to groups was ran- 
dom, a one-way analysis of variance re- 
vealed significant differences between mean 
pretest visualization scores of the groups 
(F = 3.52, df = 3/39, p < .05). 

Since, however, pretest visualization 
scores correlated positively with posttest 
visualization scores in all groups and since 
linear requirements were met, analysis of 
covariance was used, The Scheffé post hoc 
comparison showed that the mean posttest 
visualization scores of the modeling group 
were significantly higher than that of the 
short-cireuiting group and that both means 
were significantly higher than that of the 
control group (Table 5). Thus, the results 


TABLE 6 
INTERCORRELATIONS BETWEEN MEASURES FOR 
Eacu Group 

Visual-| Lan- |Mathe-| Visual- 
Measure/group aon, e matics Feat 
Visualization pretest 
Modeling — |.54*| .41 .29 
Short-circuiting — |.20 | .24 .85 
Control — |.30 | .18 .31 
Language grades 
Modeling — 12 | —.36 
Short-circuiting —|—]|.4|-. 19 
Control = 14 .63 
Mathematics grades 
Modeling n ern .38 
Short-circuiting —|—|-— .21 
Control — | = .28 


Note. n = 14 in each group. } 
a The visualization pretest consisted of the 


Paper-Folding Test. : 

b The visualization posttest consisted of the 
Surface Development Test. 

*p« .05. 


—— MODELING 
—— SHORT CIRCUITING 
—.—. CONTROL 


8 
[s] 


POSTTEST VISUALIZATION SCORES 


5 6 7 8 


LANGUAGE SCORES 


Ficure 6. Visualization posttest scores regressed 
on language grade for each group. 


appear to be in agreement with those of Ex- 
periment 1. 

When correlational analyses were done 
(Table 6), an interaction emerged between 
the subjects’ grades in the language studies 
and their posttest visualization scores (Fig- 
ure 6). In the control group, subjects with 
higher language scores performed better on 
the visualization test than low scorers (t 
ratio for the regression slope is 2.8, df = 12, 
p < .05). In the modeling group, the con- 
verse takes place (difference between the 
two regression slopes: t = 2.08, df = 24, 
p < .05). The slope of the short-circuiting 
group is negligible and can be regarded as 
not differing from zero. However, visual 
modeling of the operation facilitated the 
learning of the less verbally able subjects 
more than that of the better able ones. If 
the latter subjects did impose verbal logic 
on the operation, as we thought they did, 
then their posttest performance should not 
have been this low. It is, however, possible 
that the operation of “laying out” objects 
does not yield to logic and has to be exe- 
cuted as a schematized covert visual image. 
Tn that case, verbally able subjects would 
try, in vain, to utilize logic; hence, the little 
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benefit they had from the modeling con- 
dition. In essence, then, the results in the 
present experiment further strengthen those 
previously obtained. It would be possible to 
ask whether modeling helps subjects who 
are nonverbal thinkers more than it helps 
those who are verbal thinkers or whether it 
helps subjects with generally low intellectual 
abilities more than those of higher intellec- 
tual abilities. The fact that initial visual- 
ization ability did not interact with treat- 
ments while verbal ability did, suggests the 
former possibility is more credible. 


GENERAL DISCUSSION 


The three experiments have shown that 
improvements in two kinds of covert skills 
can take place as a result of training with 
films which model those operations. Further, 
it was found that some subjects, notably 
those with poor, relevant aptitude scores, 
profit more from such modeling, while those 
with high scores are hindered in their per- 
formance. The latter subjects profit more 
when asked to execute the operation covertly 
on their own. It was also observed that vi- 
carious learning, in the sense that a subject 
executes covertly a response while another 
verbalizes it (Experiment 2), is not the same 
as when he himself has to act out the re- 
sponse (Experiment 1). Hence, when train- 
ing criterion is lowered, learning from a 
visual model is reduced. Finally, the hy- 
pothesis that some subjects are better off 
internalizing the model through nonverbal 
mediation, received indirect support. 

However, one may wonder whether all 
that has been shown here is perhaps no more 
than another case of learning from visual 
displays. This, obviously, is not very novel 
in light of what both daily experience and 

research have repeatedly shown. Gagné and 
Gropper (1965) and Travers (1970), to men- 
tion only a few, have repeatedly shown that 
learning from visual displays is not only 
possible but even relatively more powerful 
than verbal instruction. 

There is, however, a major and very es- 
sential difference between the usual kind of 
studies dealing with visually based instruc- 
tion and the ones reported here. Clearly, the 
operations displayed on the screen by us can- 
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not be termed signs or symbols since they 
do not serve as completely arbitrary rep- 
resentations of operations but are the oper- 
ations themselves. In this respect, they are 
iconic representations in the sense that we 
assume that they have something in common 
with the covert operations which they "ex- 
ternalize.” That is, one can assume that they 
resemble the operations which the learners 
should have executed or actually did exe- 
cute covertly. The operations utilized here 
are not analogous to grammatical forms 
used in language or to verbal concepts. 
Nevertheless, they differ markedly from the 
usual information displayed in films or 
slides. It was not the details of Breughel’s 
paintings nor the structure of the solid ob- 
jects which was to be learned from these 
slides but rather the schematic operations of 
singling out details or “laying out" solid ob- 
jects. Moreover, the question we posed was 
not whether visual information is better for 
instructional purpose than verbal informa- 
tion or modeling is better than "autono- 
mous" learning; but rather, whether sueh 
schematic operations as sampled here, which 
are part of a medium's unique range of com- 
munication codes, are internalizable as sche- 
matic operations and, hence, whether they 
are used as covert schemes. 

One may ask whether exposure to media, 
such as print, film, TV, and the like leads to 
the cultivation of covert representational 
systems, as Olson (1972) or Bruner et al. 
(1966) claim. This, we hypothesized, is a 
matter of internalizing both the codes, con- 
ventions, and schematic ways of represent- 
ing something, which are unique to media, 
and their covert use as part of one’s rep- 
resentational system. The experiments re- 
ported here were a first attempt at studying 
whether such internalization is at all possi- 
ble, and if so, by whom. 

The internalization of operations and 
their use in a representational capacity are 
an important issue in Piaget’s theory. How- 
ever, he strongly emphasizes the importance 
of manipulatory learning rather than obser- 
vational learning. Nevertheless, while dis- 
cussing imitation (Piaget, 1962), observa- 
tions of operations appear to play 8$ 
important a role in his theorizing as manipu- 
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lation. But this may not be an all-or-none 
question. Logical operations, of the kind 
studied by Piaget, may be learned solely by 
manipulation (Wohlwill, 1970). Not so with 
other operations which, as the ones studied 
by us, do not follow necessarily any agreed- 
upon logie and are rather conventional 
schemes. 

Educationally speaking, what our studies 
hint at is that certain mental skills may be 
adopted from communieation media and 
thus be used to expand one's range of covert 
skills. The question, then, is not whether 
this is a “better” mode of instruction, but 
whether one can use visual media not just 
to acquire “knowledge that” but also 
“knowledge how to,” particularly for those 
learners who appear to have difficulties with 
other and more common types of instruction. 
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This study assessed the effects of nonparticipant observation on the 
behavior of the teacher and mentally deficient pupils in a language 
development class. Thirty days of classroom behavior were recorded 
on videotape under an ABAB design. The conditions were : (Condition 
1) 10 days, observer absent; (Condition 2) 10 days, observer present; 
(Condition 3) 5 days, observer absent; and (Condition 4) 5 days, ob- 
server present. The subjects were “blind;” they believed they were 
being recorded only when the observer was in the classroom. The 
videotapes were observed in random order by "blind" trained ob- 
servers. Behavioral measures indicated that nonparticipant observa- 
tion increased the frequency, but not the appropriateness, of teacher- 


pupil interactions and did not affect the appropriateness of student 


behaviors, 


Recent years have witnessed the extensive 
application of learning principles in class- 
room management and teaching methods 
(see O'Leary & O'Leary, 1972). Data col- 
lected by nonparticipant observers in the 
classroom have been widely used to assess 
program effectiveness, and until recently 
the validity of such data has been accepted 
solely on the basis of high interobserver re- 
liabilities. However, the effects of observer 
reactivity, observer bias, and methods of 
computing reliability on the internal and ex- 
ternal validity of observational data are 
currently being questioned empirically 
(Johnson & Bolstad, 1972; Kazdin, 1973). 

The nature of observational reactivity has 
long been a question in social psychology. 
Subjective evaluation of the extent of con- 
taminant introduced has ranged from "little 
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effect, if any" (Heynes & Lippitt, 1954, p. 
399) to the suggestion that the "patently 
visible" observer produces changes in be- 
havior that can diminish the generalizability 
of findings (Webb, Campbell, Schwartz, & 
Sechrest, 1966). Studies attempting to assess 
observational artifact and habituation to it 
have yielded conflicting results. Johnson 
and Bolstad (1972) noted that both the 
paradigm of measurement and the nature 
of dependent variables may contribute to 
this ambiguity. Further, the nature of re- 
activity may depend upon the conspicuous- 
ness of the observer, individual differences 
of the subjects, personal attributes of the 
observer, and the rationale for observation. 
Perhaps because of methodological prob- 
lems, only two studies have directly assessed 
the effect of observer presence in contrast 
to observer absence. White (1973), using 
concealed overhead observation, found that 
alternating 30-minute periods of observer 
presence and observer absence had the ef- 
fect of reducing significantly the activity 
level of mother-child dyads in the observer- 
present condition within a laboratory set- 
ting. In a second study, using the same 
equivalent time-samples design with a larger 
subject pool and a behavioral coding system, 
he found that observer presence was asso- 
ciated with lower rates of deviant behavior 
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for older children. Rate of mother deviant 
behavior and level of instability in rates of 
social interaction were not associated with 
conditions of observer presence and observer 
absence. No comparable observer-present 
versus observer-absent paradigm has been 
used in classroom settings. 

Two studies have dealt with observer 
reactivity within educational settings. Gus- 
sow (1964) examined the observer-observed 
relationship in four fourth-grade classrooms 
in four different schools and concluded that 
even in situations where the observer is non- 
partieipant, the observer and the observed 
were linked in a continuous and developing 
relationship. Gussow employed retrospective 
data collection using narrative-descriptive 
reports that were not germane to the issue 
of reactivity. In fact, very little data of even 
a narrative-descriptive nature appeared in 
his report. Therefore, his conclusions must 
be cautiously evaluated. Masling and Stern 
(1969) assessed observer reactivity using 
trained observers and the teachers and pu- 
pils in 23 different classrooms. Two days’ 
observations of each class were broken into 
33 five-minute data units which were ex- 
amined with the hypothesis that if the effects 
of observers diminished over time, there 
would be a smaller correlation between the 
first units of observation and the last than 
between later units and the last. However, 
the correlations obtained showed no discern- 
ible pattern over time. Two alternative con- 
clusions were proposed: (a) Observer in- 
fluence was negligible; (b) the effects of the 
observers were more complex than foreseen 
and affected various aspects of teacher and 
pupil behavior differentially. The authors 
acknowledged that the failure to include an 
observer-absent condition precluded the pos- 
sibility of studying observer effects directly. 
In addition, by studying habituation the 
authors may have been unable to detect an 
effect, since it seems unlikely that a signifi- 
cant habituation effect would have occurred 
in only two days. 

There is no consensus concerning the re- 
activity of the observational process, par- 
ticularly within pedagogical settings. There- 
fore, the present study was designed to 
answer the question: What effect, if any, 
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does nonparticipant observation have on 
pupil behavior and on teacher-pupil inter- 
actions in the classroom? 


Merxop 
Subjects 


Subjects were six adult, female residents of a 
state school for the mentally retarded and their 
language development teacher, a 23-year-old fe- 
male, with a bachelor of arts degree in psychology. 
She was employed as a child care aide and language 
instructor at the institution and had been con- 
ducting language classes for six months at the 
time of the study. 

The students ranged in age from 20 to 44 (M — 
31 years), length of institutionalization from 4 to 
30 years (M = 152 years), and intelligence 
quotients from 52 to 65 (M = 57.7), the latter 
falling in the mild to moderately retarded range 
(Heber, 1961). Only two of the students were con- 
sidered occasional behavior problems in their resi- 
dential unit. 


Setting 


The study was conducted in the students’ resi- 
dential unit, in a room specifically allocated for 
language classes. The class met from 9:00 to 9:30 
a.m., Monday through Friday, and was conducted 
using the Peabody Language Development Kit, 
Level I. The teaching procedures were based on 
behavior modification principles as part of a token 
economy program at the institution. 

A remote-controlled video camera with a wide 
angle lens and a microphone were placed on the 
top surface of a metal cabinet in the classroom two 
weeks before the present study began. The cables 
from the camera and microphone were led out of 
the room through a louvered ventilation grate, 
down a hall, and into an equipment room whose 
entrance could not be seen from the language 
classroom. A videotape recorder and monitor were 
located in this equipment room and were used to 
record the classroom behavior. The equipment re- 
mained in position for the duration of the study. 


Procedure 


After two weeks were allowed for the class to 
become adjusted to the presence of the camera 
and microphone, 30 days of classroom behavior 
were recorded on videotape using an ABAB design 
(Baer, Wolf, & Risley, 1968). The conditions were: 
(Condition 1) 10 days, observer absent; (Con- 
dition 2) 10 days, observer present; (Condition 3) 
5 days, observer absent; and (Condition 4) 5 days, 
observer present. The teacher was led to believe 
that the camera and microphone were operating 


* Appreciation is expressed to Barbara Salisbury, 
the teacher, whose understanding permitted the 
completion of the study. 
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only when the observer (the experimenter) was in 
the classroom. Further, she was told that in order 
for the camera to work the observer must turn it 
on prior to each class he was recording. This, in 
fact, was not necessary; the camera was function- 
ing during every class. She was given the set, with 
a detailed written rationale, that the purpose of 
the study was to ascertain whether an observer 
viewing videotapes of a classroom could record 
behavior as accurately as an observer in the class- 
room. She was unaware of the true nature of the 
present study and believed she was participating in 
an entirely different experiment, the “blind” (re- 
liability) study. 

Condition 1 (observer absent, 10 days). Since 
the observer was not present in the classroom, the 
teacher was unware of being monitored. She was 
told that the “blind” study had to be temporarily 
delayed until the experimenter had obtained and 
trained a videotape observer. 

Immediately preceding or just following the 
start of each class, the experimenter or his as- 
sistant surreptitiously entered the equipment room. 
Two to five minutes after the beginning of each 
class, he began to tape the session. He recorded 12 
to 15 minutes of classroom behavior, turned off 
the equipment, and left the room without the 
teacher's or students’ knowledge. There were days 
when the teacher was aware that either the ex- 
perimenter or his assistant was in the equipment 
room; however, she was unaware of the recordings 
taken during this condition. 

Condition 2 (observer present, 10 days). Under 
Condition 2 the teacher was led to believe that 
the "blind" study had begun. She was told that an 
observer would be present in the classroom and 
would take time-sample recordings of classroom. 
behaviors which would then be compared to rat- 
ings made by a videotape observer from tapes 
of the same class periods. She was informed that 
10 days of recordings were necessary for an ac- 
curate comparison of the ratings. 

Immediately prior to each class, the experi- 
menter entered the classroom and pretended to 
turn on the camera. He then left the room to get 
his clipboard, stopwatch, and rating forms and 
reentered the classroom, taking a seat out of the 
camera’s viewing range. The experimenter main- 
tained the attitude of a nonparticipant observer, 
avoiding eye contact or interactions with either 
the teacher or students. Although the experimenter 
rated classroom behavior for 10 minutes (enhanc- 
ing the blind given the teacher), the classroom be- 
havior was taped exactly as it was in Condition 1; 
that is, 12-15 minutes were videotaped by an 

assistant after the experimenter had been in the 
classroom for approximately 2 minutes. The ex- 
perimenter occasionally interacted with the teacher 
following the class, but he never explained his 
ratings or made comments or criticisms regarding 
the class, 

Condition 3 (observer absent, § days). On com- 
pletion of Condition 2 the teacher was told that 
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the videotape rater would be making his ratings 
during the following week and if the reliability 
between observers was either consistently high or 
low, then the 10 recorded days would be enough; 
if not, 10 more days of recorded classroom be- 
havior would be needed. Condition 3 was then 
conducted in the same manner as Condition 1, 
while the videotape rater purportedly rated the 
tapes made under Condition 2. 

Condition 4 (observer present, 5 days). On the 
last day of Condition 3 the teacher was shown per- 
centage agreements, ostensibly classroom and 
videotape observer reliabilities, that were neither 
consistently high nor low. She was informed that 
10 more days of classroom recording would be 
needed. Only the first 5 days of the recording 
constituted Condition 4, using the same proce- 
dures as Condition 2, while the next 5 days' record- 
ing were used to train raters for the present study. 

Debriefing. On completion of recording, the 
teacher was told the true nature of the study. 
She was given permission to view all recordings, 
and it was agreed that data would not be used 
without her expressed permission. The teacher re- 
ported no knowledge of the true nature of the 
study while it was in progress. 

Behavioral ratings. Two male undergraduates‘ 
were selected to be behavioral raters and received 
course credit for participation, Given the same 
“blind” as the teacher, they were told they were the 
the videotape raters in a study assessing the agree- 
ment between an observer in a classroom and a 
videotape observer. Each was given a manual 
describing the “blind” study and the behavioral 
rating forms. Since the presence of the observer in 
the classroom could not be determined by viewing 
the tapes, the raters believed that the observer 
had always been present. 

The raters were first trained to rate classroom 
behaviors with a modified version of the Teacher- 
Pupil Interaction Chronograph (TPIC; Meyers & 
Craighead, 1973). Each TPIC form was composed 
of six 8 X 10 matrices; each matrix was used to 
rate 30 seconds of interaction between the 
teacher and pupils, three minutes per form. The 
following eight categories of pupil behavior were 
represented along the top of each matrix: 

1. Gross motor—getting out of chair, standing 
up, walking around, moving chairs, running (unless 
specified by teacher). 

2. Orienting—turning head or body to interact 
with another person not specifically instructed by 
the teacher, must be of four seconds duration, 45 
degrees or more for the body and/or 90 degrees or 
more for the head, using the chair as a reference. 

3. Object noise—tapping tokens or other ob- 


‘Appreciation is expressed to David Barnes, 
Norvin Cooley, and William Fernan, who served 
as raters. Appreciation is also expressed to Jerome 
Gordon, Ronald Madle, and Laurent Hahn for 
their assistance with the videotape aspects of this 
study. 
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jects, clapping, tapping of feet (unless specifically 
instructed by teacher). 

4. Disturbance of others and contact (nonver- 
bal)—disturbance of other's property or tokens, 
throwing objects at another pupil, hitting, pinch- 
ing, kicking, shoving, slapping, striking with thrown 
object, poking (any physical contact unless specifi- 
cally instructed by teacher). 

5. Verbalization—inappropriate conversation 
with another student, answering teacher without 
being called on, making comments or calling out, 
calling teacher’s name to gain attention, crying, 
singing, screaming, laughing (they must be inap- 
propriate). 

6. Ignoring—ignoring the teacher, may be in- 
volved in other inappropriate behavior or may be 
just sitting doing nothing whatsoever. 

7. Other inappropriate—other time off teacher 
defined tasks not included in categories listed 
above. 

8. Appropriate behavior—time on teacher de- 
fined tasks. 


Ten categories of teacher behavior were ar- 
ranged along the left side of the matrix: 

1. Positive prompt—a statement of positive ex- 
pectations or requirements which incorporates a 
reference to the positive consequences of the be- 
havior for the student. 

2. Negative prompt—a statement of negative 
expectations or requirements which incorporates a 
reference to the negative consequences of the be- 
havior for the student. , 

3. Positive statement—expression of expectation, 
encouragement, a command, instruction or sug- 
gestion without reference to the consequences. 

4. Negative statement—expression of expecta- 
tions, discouragement, prohibition, without refer- 
ence to consequences. 

5. Positive verbal feedback—compliments, 
praise, verbal positive reinforcement, positive 
feedback, often of the form of repeating correct 
pupil verbal response. 3 

6. Negative verbal feedback—reprimands, neg- 
ative feedback, derogatory remarks, audible 
"shushing." EA 

7. Positive nonverbal feedback—smiling, pleas- 
ant expression, positive gesture, affectionate or 
complimentary pat. " 

8. Negative nonverbal feedback—frowns, grim- 
aces, negative gestures, also pulling, shoving, or 
restraining the student. 

9. Ignore, no response—behavior of the student 
followed by no response by teacher, she turns 
away or does not react verbally or nonverbally, al- 
though it must be obvious that the teacher has 
seen or heard the behavior. L : 

10. Neutral—questions or conversation without 
positive or negative valence, classroom announce- 
ments. " 


By matching a particular row with a particular 
column (or column with row) and recording the 
initial of the pupil with whom the teacher was in- 
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teracting, the raters could record the teacher- 
pupil interactions. The first four teacher behaviors 
had to precede the behavior of a student, and 
Teacher Behaviors 5 through 9 had to follow the 
behavior of a student. Neutral teacher behavior 
could precede or follow student behavior. Each 
token dispensed during an interaction was recorded 
by placing a check over the student's initial. 

During training the experimenter and the two 
raters recorded simultaneously, one TPIC form at 
a time. Training continued until percentage agree- 
ments (Number of Rater Agreements on Recorded 
Behaviors divided by the Number of Recorded 
Agreements plus Disagreements times 100) per 
TPIC form between the raters and between each 
rater and the experimenter were 83% or above for 
the equivalent of five days of classroom rating, 15 
TPIC forms per rater. 

Following TPIC training, the taped classes 
were randomized and assigned to the raters to 
rate individually. Every fifth class served as a 
reliability check. The experimenter and one rater 
were present during independent ratings; the ex- 
perimenter ran the equipment. The rater began 
rating at an arbitrarily predetermined time on the 
tape (ostensibly the time the classroom observer 
began his ratings) by beginning his stopwatch 
when that time was visible on a clock in the taped 
classroom. The rater completed nine minutes of 
TPIC rating per class period. He viewed the as- 
signed section of tape as many times as he felt 
necessary for accuracy. The procedure for re- 
liability checks was the same except both raters 
were present, rating simultaneously. 

On completion of TPIC ratings, the raters were 
trained with the Overall Behavior Checklist 
(OBC), a modified form of the Time-Sample Be- 
havior Checklist deseribed elsewhere (Meyers & 
Craighead, 1973). Each OBC yielded four minutes 
of behavioral ratings for each student regardless 
of interactions with the teacher. Each minute was 
divided into six 10-second intervals; five to record 
behaviors and the sixth to make comments on the 
checklist. It was possible to record any of eight 
student behaviors (those listed for the TPIC, 
above) in each 10-second interval. Training proce- 
dures were identical to TPIC training and con- 
tinued until the percent agreements per OBC be- 
tween raters and between each rater and the 
experimenter were consistently above 8595 for the 
equivalent of rating three students per day for 
five days or 15 OBCs per rater. 

Following OBC training, the taped classes were 
again randomized and assigned to raters. Every 
fifth class served as a reliability check. Rating 
procedures were similar to TPIC monitoring, ex- 
cept the time to begin rating was set later in each 
class to avoid repeating sections covered by TPIC 
rating. The rater completed four minutes of OBC 
rating for each student; each student was rated 
separately over the same time period. The rater 
viewed the assigned section of tape as many times 
as necessary for accuracy; the rating was done 
during the last viewing of each tape. 
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Following OBC ratings, the raters were trained 
to count the number of tokens dispensed per min- 
ute by using five 10-minute periods, the equivalent 
of 10 minutes of rating per class for five classes; 
100% agreement was obtained for all five 10-min- 
ute training trials. The raters were told token 
counting was necessary to determine the accuracy 
of the TPIC in assessing the number of tokens 
dispensed per class. 

Because of perfect training agreement only one 
rater counted tokens for all classes. The rater 
began counting at the identical times as were used 
for TPIC ratings and counted for 10 minutes per 
class. 

Dependent variables. Six dependent variables 
were obtained from the TPIC, OBC, and token 
count. They were: (a) number of teacher-pupil 
interactions; (b) proportion of appropriate 
teacher-pupil interactions; (c) proportion of inap- 
propriate teacher-pupil interactions; (d) pro- 
portion of neutral teacher-pupil interactions; (e) 
proportion of appropriate student behaviors; and 
(f) number of tokens dispensed. 

The number of teacher-pupil interactions was 
determined by counting the number of interactions 
recorded on the TPIC per day. The proportions 
of appropriate, inappropriate, and neutral teacher- 
pupil interactions were determined by first arbi- 
trarily designating cells in the TPIC matrix as ap- 
propriate, inappropriate, and neutral from a 
behavior modification conceptualization (see 
Meyers & Craighead, 1973). Postive prompts, posi- 
tive statements, positive verbal feedback and 
positive nonverbal feedback occurring in conjunc- 
tion with appropriate student behavior, and teacher 
ignoring occurring in conjunction with inappropri- 
ate student behaviors were defined as appropriate 
teacher-pupil - interactions. Neutral teacher be- 
havior occurring in conjunction with any student 
behavior was defined as neutral teacher-pupil in- 
teraction. All remaining combinations of behaviors 
were defined as inappropriate teacher-pupil inter- 
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Ficure 1. The percentage of appropriate student 
behavior for all students (open triangles) and the 
percentage of appropriate (solid circles), inap- 
propriate (open circles), and neutral (solid tri- 
angles) teacher-pupil interactions for each day of 
each treatment condition. 
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actions. The proportion appropriate was then de- 
termined by dividing the number of appropriate 
interactions per day by the total number of inter- 
actions per day. The proportions inappropriate and 
neutral were similarly calculated. The proportion 
appropriate student behavior was determined by 
finding the number of appropriate 10-second inter- 
vals per pupil per day on the OBC and dividing 
that by 20, the total possible number of appropri- 
ate 10-second intervals per day. The number of 
tokens dispensed was calculated by counting the 
number of tokens per 10-minute interval per day. 


RESULTS 


Rater percent reliabilities (Number of 
Rater Agreements on Recorded Behaviors 
divided by the Number of Recorded Agree- 
ments plus Disagreements times 100) for 
the TPIC and OBC were calculated at re- 
liability checks every fifth taped class. 
TPIC reliabilities ranged from 84% to 10076, 
with a mean of 92.4%. OBC reliabilities 
ranged from 85% to 100%, with a mean of 
97.8%. Because of 100% agreement during 
token count training, no reliability checks 
were made for the token count. 

Figure 1 presents the percentage of ap- 
propriate student behavior per day for all 
students under Conditions 1, 2, 3, and 4. 
After are sine transformations for propor- 
tional data, these data were submitted to 
one-way repeated measures analysis of vari- 
ance for unequal cell frequencies; there were 
no significant differences in appropriate 
student behavior across treatment con- 
ditions (F < 1.00). 

The percentage appropriate, inappropri- 
ate, and neutral teacher-pupil interactions 
under Conditions 1 through 4 are also pre- 
sented in Figure 1. After arc sine transforma- 
tions for the proportional data, they also 
were submitted to one-way repeated mea- 
sures analyses of variance for unequal cell 
frequencies, No significant treatment effects 
were found in any of the three analyses 
(F < 100; F = 1.186, df = 3/26; F < 1.00, 
respectively). 

Figure 2 presents the number of teacher- 
pupil interactions per day under Conditions 
1, 2, 3, and 4. These data were submitted to 
one-way repeated measures analysis of vari- 
ance for unequal cell frequencies, and a sig- 
nificant difference was found between con- 
ditions (F = 6.171, df = 3/26, p < 003). 
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A Scheffé post hoe complex comparison of 
means for Conditions 1 and 3 versus 2 and 
4 was significant (—39.37 < y, < —3.73, 
p < .05), with the mean for Conditions 2 and 
4 being greater. T tests for differences among 
means of groups with unequal Ns were then 
performed on the data. The means of Con- 
ditions 1 and 3 were not significantly dif- 
ferent (t — 1.40, df — 26) nor were the 
means of Conditions 2 and 4 (t — .70, df — 
26). The mean of Condition 1 was signifi- 
cantly different from the means of Con- 
ditions 2 and 4 (t — 291, df — 26, p « .05 
and t = 3.61, df = 26, p < .05, respectively) 
with the means of Conditions 2 and 4 being 
greater. The mean of Condition 2 was not 
significantly different from the mean of 
Condition 3 (t — 1.50, df — 26), although 
the mean of Condition 3 was less (see Figure 
2). Lastly, the mean of Condition 3 was 
significantly different from the mean of 
Condition 4 (t = 2.21, df = 26, p < .05), 
with the mean of Condition 4 being greater. 

Figure 2 also presents the number of 
tokens dispensed per day under Conditions 
1, 2, 3, and 4. These data were submitted to 
one-way repeated measures analysis of vari- 
ance for unequal cell frequencies; no sig- 
nificant differences (p < .05) between con- 
ditions were obtained (F = 2.792, df = 
3/26, p = .06). A Pearson product-moment 
correlational analysis yielded a significant 
positive correlation (r = .74, df = 58, p < 
.005) between the number of tokens dis- 
pensed and number of teacher-pupil inter- 
actions per day. 


Discussion 


OBC data revealed no significant differ- 
ences in the percentages of appropriate stu- 
dent behavior across conditions. The initial 
high level of appropriate student behavior 
may have precluded the demonstration of 
an experimental effect in the direction of 
more appropriate behavior during observa- 
tion. The quality of teacher-pupil interac- 
tions, assessed by the TPIC, was not affected 
by observation; neither the percentage of 
appropriate, inappropriate, nor neutral 
teacher-pupil interactions changed signifi- 
cantly across conditions. The low initial 
baseline made it difficult to demonstrate a 
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Ficure 2. The total number of teacher-pupil in- 
teractions (solid circles, condition means indicated 
by broken lines) and the total number of tokens 
dispensed by the teacher (open triangles, con- 
dition means indieated by solid lines) for each 
day of each treatment condition. 


decrease in percentage of inappropriate be- 
havior as a result of observation, such as 
was suggested by White's (1973) data. 

The TPIC data did indicate a significant 
effect on the quantity of teacher-pupil in- 
teractions. When nonparticipant observa- 
tion was instituted (Condition 2), the num- 
ber of interactions increased significantly ; 
when removed (Condition 3), the number of 
interaetions decreased with the reversal ap- 
proaching statistical significance. With the 
reinstatement of observation (Condition 4) 
the number of interactions again increased 
significantly. 

If reversibility does not occur in the 
ABAB design, the effect of the experimental 
variable remains unclear. In the present 
study the reversal only approached signifi- 
cance. If the effects of treatment conditions 
are not completely transient and reversible, 
the behavior produced in the experimental 
condition may show some resistance to ex- 
tinction when there is a return to baseline * 
conditions (Kazdin, 1973). Perhaps in- 
ereased classroom interaction is not a tran- 
sient, readily reversible phenomena. Anec- 
dotally, the teacher reported increased fa- 
miliarity with the class during the study. If 
this inereased familiarity resulted in in- 
creased interaction over the period of the 
study, beyond the effects of the observa- 
tional process, the obtained results are not 
surprising. 

It is elear, however, that significant in- 
ereases in interaction were associated with 
the introduction of observation. Further, the 
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combined interactions of Conditions 2 and 
4 (observer present) were significantly 
greater than the combined interactions of 
Conditions 1 and 3 (observer absent). These 
results argue strongly that the conditions 
of nonpartieipant observation did inerease 
the quantity, although not the quality, of 
teacher-pupil interactions. 

There was a significant positive correla- 
tion between the number of tokens dispensed 
per day and the number of interactions per 
day. Further, the trend in the mean number 
of tokens dispensed per condition was sim- 

r to that of the mean number of interac- 

ns per condition. It is difficult to deter- 

ine whether the number of tokens varied 

8 a function of the number of interactions, 

vice versa, or if both varied as a function of 
some third variable (e.g., presence of ob- 
servation). 

The data do not indicate habituation to 
observation; in fact, the teacher appeared 
periodically resensitized to observation as 
indicated by the increased number of inter- 
actions approximately half way through 
each observational condition. It has been hy- 
pothesized that at least 10 to 12 days are re- 
quired for habituation to an observer's pres- 
ence (Medley & Mitzel, 1963). If this is true, 
it is highly unlikely that effective habitua- 
tion would have occurred during the present 
study’s observational conditions, 10 and 5 
days respectively. Even if habituation is a 
demonstrable phenomenon, it may not be 
feasible economically to have observers pres- 
ent long enough for the habituation to take 
place; moreover, the researcher’s choice of 
the point at which the subjects’ behavior is 
“normal” (i.e., free from observer influence) 
is an arbitrary one (Masling & Stern, 1969). 

Other observational reactivity was noted 
anecdotally. The teacher related an “in- 
ability” to act normally while being ob- 

served and asked the experimenter how he 
would like her to act for the study (she was 
told to act normally). Like Gussow’s (1964) 
teachers, she showed interest in what was 
being recorded and in what manner the 
records were to be used. Although data were 
not recorded (at the teacher’s request), it 
appeared to the experimenter that the 
amount of time standing in front of the 
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classroom while teaching varied as a func- 
tion of conditions. The students attempted 
to interact with the experimenter as he ob- 
served in the classroom and occasionally 
asked the experimenter after class what he 
had been recording. They were also over- 
heard telling other residents of the institu- 
tion they were “on TV." Nevertheless, stu- 
dent cognizance of observation apparently 
had no effect on student behavior across con- 
ditions. 

The effect of increased interactions can- 
not be attributed solely to the presence of 
an observer. Under Conditions 2 and 4 (ob- 
server present) the subjects were aware of 
being recorded on videotape for the “blind 
study." The systematic confounding.of vid- 
eotape recording with observer presence 
made the independent variable nonpartici- 
pant observation. A cleaner design would 
call for the use of concealed observation 
undetectable by the subjects. 

The wide range of student ages and years 
of institutionalization and their more limited 
range of IQs, not atypical of the institu- 
tionalized retardate, argue for generalization 
to a large female resident population. The 
generalization from the sample teacher is 
perhaps more limited. She appeared very 
capable in applying behavior modification 
principles to classroom control (e.g., low 
levels of inappropriate teacher-pupil inter- 
actions under all conditions). A competent 
teacher may have little reason to feel threat- 
ened by the presence of an observer familiar 
with the principles applied to her teaching 
and therefore be unlikely to change her 
overt behavior. Further, both the teacher 
and the pupils were observed on a periodic 
basis within the institution, though not in 
this particular class. A greater reactivity 
may be detected in a setting where periodic 
observation is not the rule. 

Feedback from observers concerning at- 
tending behavior to appropriate child re- 
sponses can alter teacher attending behavior 
(Cooper, Thomson, & Baer, 1970) and data 
(Panyon, Boozer, & Morris, 1970) have 
suggested that feedback to attendants may 
act as a reinforcer for applying operant 
techniques. It has also been noted in rein- 
forcement programs where the presence of 


an observer is associated with implementa- 
tion of a program to establish behavior that. 
there is the possibility of the target be- 
haviors coming under discriminative con- 
trol of the presence of the observer (Kazdin, 
1973; Surratt, Ulrich, & Hawkins, 1969). 
Since the present study associated neither 
feedback nor the application of operant 
techniques with observation, generalization 
to studies which have included such pro- 
cedures should be drawn cautiously. When 
feedback is involved, the potential for 
greater reactivity exists. However, since the 
speed of acquisition of a target behavior is 
a direct function of the number of reinforce- 
ments, one must also be cautious in compar- 
ing the findings of studies which differ in 
whether an observer was present or absent; 
the present findings suggest that the presence 
of an observer increases the number of in- 
teractions, and thus the frequency of rein- 
forcement, 

. The finding that observation was asso- 
ciated with increased frequency of teacher- 
Pupil interaction would not have been pre- 
dicted from White's (1973) finding that ob- 
Server presence was associated with de- 
creased activity level of mother-child dyads. 
These opposite findings suggest that the 
à nature of observer reactivity may be situa- 
tion specific; therefore, it is essential that 
the phenomenon be evaluated in a variety of 
Settings. It may be concluded that observa- 
tional reactivity does occur, however mini- 
mal its effects, and it should be taken into 
account in generalizing from observational 
studies. Further research should indicate the 
parameters which are important contribu- 
tors to the “observation reactivity” limita- 
tions on generalization. 
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Several previous investigations have revealed contrasting patterns of 
interaction by teachers with students toward whom they hold attitudes 
of attachment, concern, indifference, or rejection. The present study 
focused on discovering the origins of these teacher attitudes by collect- 
ing first-grade teachers’ early impressions of students. Teachers were 
interviewed during the first two weeks of school, prior to readiness test 
administration, and their comments about students whom they later 
assigned to each of the four attitude groups were analyzed. The find- 
ings were quite consistent in yielding distinct profiles for children in 
each respective group, complementing and extending previous research 
by indicating the student attributes associated with the formation of 


these four teacher attitudes. 


Silberman (1969) asked teachers to nom- 
inate one student in their class to each of 
four attitude groups: (a) attachment (If 
you could keep one student another year for 
the sheer joy of it, whom would you pick?) ; 
(b) indifference (If a parent were to drop 
in unannounced for a conference, whose 
child would you be least prepared to talk 
about?) ; (c) concern (If you could devote 
all your attention to a child who concerned 
you a great deal, whom would you pick?) ; 
(d) rejection (If your class was to be re- 
duced by one child, whom would you be 
relieved to have removed?) . 

He than observed for 20 hours in each 
class to discover what these children were 
like and how their teachers interacted with 
them. He found the attachment students to 
be “model” students, high achievers who 
conformed to the teachers’ wishes and ful- 
filled their personal needs. Observation of 
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the teachers interacting with these attach- 
ment students showed some evidence of 
subtle favoritism, but no gross favoritism. 

Concern students tended to be dependent, 
low-achieving students who madè extensive 
but approved and appropriate demands 
upon the teachers. The teachers interacted 
most frequently with these students and in 
general behaved in ways consonant with 
their expressed concern about these stu- 
dents' achievement levels. 

The indifference students did have 
any particular identifying chara ‘istics 
except for their low frequencies of ipterac- 
tion with the teachers. In obse g the 
teachers, Silberman noted that teachers’ in- 
teractions with indifference students not 
only were infrequent but also were briefer * 
and less emotionally involving than those 
with other students. 

The rejection students*tended to be be- 
havior problems who made demands that 
the teachers saw as illegitimate or over-) 
whelming. Teachers" had frequent contacts 
with these students, but a large proportion 
of these contacts involved intervention to 
control their misbehavior. Yet these stuz} 
dents also received considerable teacher 
praise, as if the teachers were attempting tò 
“make up for” the generally negative tenor 
of their interaction with them. 
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Additional research on the student char- 
acteristies and teacher-student interaction 
patterns involving students in Silberman's 
four attitude groups has been done by Jen- 
kins (1972), Good and Brophy (1972), and 
Brophy and Good (1974). All of these stud- 
ies generally support Silberman's results 
;and impressions, although there are some 
inconsistencies. The teachers in Jenkins’ 
study saw the attachment students as high 
achievers and as students with warm rather 
than neutral or hostile attitudes toward 
themselves (the teachers). Good and Bro- 
phy (1972) found the attachment students 
to be well-behaved high achievers. Similar 
findings were obtained by Brophy and 
Good (1974). Thus, in general, attachment 
students appear to be high-achieving stu- 

ents who conform and respond warmly to 
eachers. Nevertheless, none of the four 
studies found any grossly overt favoritism 
of these students, although they all sug- 
gested that the teachers favored attachment 
students in subtle ways that were not being 
Picked up by the behavioral measures in- 
cluded. # 
The four studies agree in showing the 
concern students to be low achievers (the 
primary reason for teacher concern) and 
also suggest that these students are depend- 
ent upon teachers but express this depend- 
ency in ways that the teachers find reward- 
ing or at least acceptable. Thus, they re- 
spond to these students with concern about 
their low achiévament and with redoubled 
efforts to do something about it through 
more frequent contacts and provision of tu- 
torial help and other aids to learning. 
The four studies find that rejection stu- 
dents are mostly low achievers, but not as 
consistently low as the concern students. 
However, students in the rejection group 
apparently “turn off” teachers. This may be 
done actively through defiance or disobedi- 
ence, or more subtly through failure to re- 
spond positively to the teachers’ overtures. 
Findings regarding teacher interactions 
with rejection students are mixed. Most 
Studies have found the pattern noticed by 
Silberman: evidence of a strained teacher- 
student relationship featuring frequent dis- 
Ciplinary contacts and criticism but tem- 
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pered by frequent praise and other evidence 
of teacher attempts to “make up for" nega- 
tive behavior. However, Good and Brophy 
(1972) found a clear-cut pattern of rejec- 
tion untempered by any evidence of at- 
tempts to compensate by providing positive 
behavior toward rejection students. 

A common thread running throughout 
these studies is that students in the concern 
and rejection groups appear to be quite 
similar at the gross observational level, 
with the exception that rejection-group stu- 
dents are more likely to be classroom disci- 
pline problems than concern-group stu- 
dents. Teachers show sharply contrasting 
reactions to these two types of students, 
however. They tend to be quite supportive 
and attentive primarily to the academic 
needs of students in the concern group but 
to be rejecting and primarily attentive to 
the conduct of students in the rejection 
group. 

The four studies agree in finding students 
in the indifference group to be characterized 
primarily by low rates of interaction with 
the teacher. Likewise, data from some of 
the studies support Silberman's observa- 
tions that teacher interactions with them 
are brief and low in emotional intensity. 
Whereas teachers tend to respond warmly 
to the attachment group, become concerned 
about students in the concern group, and 
reject or develop conflictual responses to- 
ward students in the rejection group, they 
respond with indifference or apathy toward 
students in the indifference group, often 
acting as if they were unaware that these 
students were even in the room. The student 
attributes that trigger this reaction (or, 
more properly, nonreaction) have not been 
identified, however. 

The purpose of the present study was to 
further explore these four attitude groups to 
try to identify the student characteristics 
that trigger these four attitudinal responses 
in teachers. More particularly, this study 
sought to identify some of the descriptive 
characteristics of indifference-group stu- 
dents and some of the differences between 
concern- and  rejection-group students 
which might explain the strongly contrast- 
ing teacher reactions to these two groups. 
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METHOD 


Subjects 


Subjects were 28 female first-grade teachers and 
their students, drawn from a large urban publie 
school system, The students were almost all from 
white, middle-class homes. Due to a teacher cross- 
over plan for desegregation purposes, 9 of the 
teachers were black. All teachers worked in self- 
contained classrooms in which they taught their 
children all of the basic academic subjects. Teach- 
ers working in team-teaching arrangements or in 
special education classes were not included in the 
study. The school system was selected because its 
children do not attend kindergarten (unless they 
attend private kindergartens). Thus, when the chil- 
dren entered the first-grade classrooms of the 
teachers in the study, they were unknown to the 
teachers and did not bring with them records con- 
taining test data or information about kindergarten 
performance. Therefore, except in a few cases where 
the teachers were neighbors of the children or 
where they knew the family from having taught 
an older sibling, the teachers and students were 
unfamiliar with one another before school began. 


Procedure 


Each teacher was seen four times, once to ex- 
plain the purpose of the study and solicit her co- 
operation, and three times to conduct interviews in 
her classroom after school hours. At the first meet- 
ing, teachers were informed that they would be in- 
terviewed at three points during the year to find 
out what they had noticed about each of their stu- 
dents. It was explained that very little research in- 
formation is available about the kinds of student 
characteristics that teachers notice and use in form- 
ing impressions about students and that the pur- 
pose of the study was to gather such information. 
Teachers were informed that interviews would be 
informal but would be tape recorded and were as- 
sured that all information would be strictly confi- 
dential. No difficulty was encountered in obtaining 
cooperation from teachers. They found partici- 
pation in the study to be stimulating and enjoy- 
able, and the guaranteed confidentiality apparently 
eliminated any hesitation that they may have had 
about participating or about speaking freely. 

Interviews were conducted by the first author 
and by two middle-aged female assistants trained 
by her for this purpose. Interviews were conducted 
at three points: (a) during the first two weeks of 
school, before any test data were available; (b) 
one to two weeks after the Metropolitan Readiness 
Test had been administered and scored by the 
teachers (about four weeks after the beginning of 
school) ; and (c) during the second and third weeks 
in January, at the beginning of the second semes- 
ter. Interviews averaged about an hour, although 
they varied by teacher. For the first interview, the 
teacher received the following instructions: 


The purpose of this interview is for you to 
discuss briefly each child in your class. In dis- 
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cussing each child, you should indicate the char- 
acteristics and actions you have noticed about 
each child. Do not try to limit what you say to 
only one type of information—just mention any- 
thing you know or have observed about this 
child. You don't have to use complete sentences 
—you may just give some adjectives or phrases 
which describe the child. 


Directions for the second and third interviews 
were as follows: 


We will be discussing your students as we did 
the last time. Now that you have had your stu- 
dents for approximately a month [four months 
at the third interview], we are interested in what 
you have noticed about them. Feel free to repeat 
anything you said last time about a child or to 
add anything new you have noticed about the 
child. You don’t have to use complete sentences 
—you may just give some adjectives or phrases 
which describe the child. 


Following these instructions, the interviewel 
then began naming the children in the teacher! 
class and asked her to respond, following a p 
arranged randomized order of names. Once the in- 
terview began, the interviewer's primary function 
was to act as an interested listener and to encour- 
age the teacher to talk comfortably. Since the in- 
terviews were tape recorded, the interviewers did 
not have to take down the information and could 
spend their time responding to the teacher and en. 
couraging her to continue talking. Ground rules 
were set up for interviewers to help insure that 
they did not cue teachers to give certain kinds of 
information or begin to reinforce them for certain. 
kinds of comments. Also, teachers were instructed 
to confine their responses to what they had ob- 
served about the child and to omit discussions of 
test scores, attendance data, or long anecdotes 
which were intended to exemplify or justify a gen- 
eral statement about the child that had just been 
made. The interviewer avoided making evaluative 
comments on the teacher's statements. Teachers 
were encouraged to continue talking about a given 
child as long as they had additional things to say, 
When they seemed to have exhausted their percep- 
tions about a child, the interviewer then named the 
next child. 

‘After each interview, the teachers were asked to 
rank their students in order according to their eX- 
pected achievement levels. These data were us 
as part of an investigation of the student charac- 
teristics noticed and used by teachers in forming 
impressions about students' academic potential 
(see Willis, 1972). Also, following the third inter- 
view, the teachers were asked 
three students to each of the 
(attachment, indifference, 
using the method of Silberman 
previously. These attitude-group 
formed the basis for the present study, 
allowed investigation of the characteris! 
dents nominated to these four attitude groups. 


Coding the Interviews 


The tape-recorded interviews were transcribed 
and then coded according to a system devised by 


the present authors. Coding categories had not 
been established prior to the interviews; they were 
established after the interviews had been com- 
pleted, based upon observed responses. An initial 
System was devised in response to a set of five in- 
terviews, and the system was then revised after it. 
had been tried out with an additional five inter- 
views. The revised system was then used to code 
all of the interviews, with intercoder agreement 
being 85% (number agreed divided by itself plus 


number disagreed). Disagreements were resolved 
by discussion, During initial coding and resolution 
of coding differences, the authors did not know 
whether or not the child in question had been nom- 
inated to one of the four attitude groups by his 


teacher, so that the results of the study could not 
have been biased by the authors’ knowledge of the 
teacher's general attitude toward the student. 

The teachers’ interview responses about each 


‘Student were classified into 11 major divisions. 
There were 117 specific categories within these 11 
major divisions, as well as a general/unclassified 
Category for each division to handle statements 
which fell into that division but which did not 
clearly fit into one of the more molecular cate- 
Kories, The 11 divisions included statements con- 
cerning the following: the physical description of 
the child, the child's family, the child's health or 
Physical condition, the child's social or emotional 
Characteristics, the child's interaction with other 
children, the child's attitudes or motivation con- 
cerning school, the child's classroom behavior or 
the kinds of management problems that he pre- 
Sented, the child's readiness for school or readiness- 
related abilities and skills, the child's oral or verbal 
Skills, the child's work habits or ability to do class- 
room work, and the teacher's feelings concerning 
the child, the nature of their relationship, or indi- 
Vidual interactions that she had had with the child. 

Tn addition to coding each statement within one 
9f the categories in the system, the statement was 
coded for whether it was positive, neutral, or nega- 
tive. Thus, the coding yielded information both 
about what kinds of things the teacher had noticed 
about the child and about whether the information 
Was positive, neutral, or negative. 

Data for the present study were taken from the 
first interview, done during the first two weeks of 
School. Thus, they reflect the teachers' early im- 
Pressions of the students, based solely on contact 
With them in or out of class. No standardized test 


data were available yet. 


Data Analysis 


Each child was scored present or absent for each 
of the basic categories in the system. Thus, he had 
^ presence/absence score for each of 117 specific 
Categories and 11 unclassified/general categories. 
One hundred and twenty additional scores were 
derived through algebraic transformation of these 
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original 128 categories, yielding 248 scores, The 
120 additional scores were sum scores for each of 
the 11 major categories and sums of positive and 
negative statements within and across categories. 

To assess the relationship between perceived 
characteristics of the students and assignment of 
students to the four attitude Eroups, a series of 
one-way analyses of variance was performed, in 
which students nominated to a given attitude group 
were compared with all other students of the same 
sex. Comparisons were made separately within sex, 
because there is evidence that attributes associ- 
ated with sex role expectations are perceived more 
positively when they appear in a student of the “ap- 
propriate” sex than when they appear in a student 
of the opposite sex (Brophy & Good, 1974; Fesh- 
bach, 1969). All students were included in the anal- 
yses except those who were repeating the first 
grade. The latter were excluded because teachers 
had known them for an extra year and because 
their status ns repeaters probably affected the 
teachers' attitudes toward them in systematie but 
unknown ways. 

These 248 comparisons for each attitude group 
involved 355 boys and 311 girls. Given the large 
Ns, it might seem that statistically significant 
values could be obtained even where group differ- 
ences were minor in absolute magnitude. However, 
with the exception of the sum scores for the major 
categories and a few heavily used categories, inci- 
dence of category use was low enough to prevent 
many differences from reaching the .05 level of sig- 
nificance. In fact, category usage was often so low 
that a meaningful analysis could not be performed. 
Thus, the .05 level of statistical significance was 
chosen as the criterion for inclusion of a group- 
difference finding in this report; all group differ- 
ences reported below are based on F values at or 
below the .05 significance level. 


RESULTS 


The distributions of boys and girls in the 
four groups are shown in Table 1. Although 
the differences are not large (except in the 


TABLE 1 


DiısTRIBUTIONS or Boys AND GIRLS IN 
THE Four ATTITUDE GROUPS 


Group 
Disi | Attachment | Concern | Indifference | Rejection 
Boys | Girls [Boys | Girls [Boys | Girls [Boys | Girls 
n 32| 32 | 42| 22 m 23 m 25 
%* | 9.7) 11.2]12.7| 7.7 | 9.1| 8.0 |11.2| 8.7 


* Percentage of total number of boys and girls, 
respectively, who were nominated to each atti- 
tude group (each teacher named as many as three 
children to each group). 


524 


concern group), they reaffirm the frequent 
findings that boys are more salient than 
girls (more likely to be noticed and com- 
mented upon) and more likely to be per- 
ceived negatively by teachers (Brophy & 
Good, 1974). 

As expected, teachers’ comments about 
children in the attachment group were over- 
whelmingly positive. Concerning the boys 
nominated to the attachment group, the 
teachers made more positive comments 
about their clothing, more often said that 
they had an immature appearance, more 
often said that they had a visual impair- 
ment or required glasses, less often said 
that they were quiet, more often assigned 
them as leaders or classroom helpers, more 
often described them as helpful with other 
children, more often described them as busy- 
bodies, more often stated that they knew 
left from right and could stay within lines 
on a tablet (readiness skills), more often 
stated that they did not draw well, more 
often made negative comments about their 
reading ability, more often stated that they 
volunteered information during classroom 
discussions, more often mentioned a percep- 
tual problem or learning disability, more 
often mentioned positive classroom behav- 
ior, more often mentioned positive social 
behavior, and more often mentioned the 
student as a high-ability student. The latter 
perceptions were confirmed by these stu- 
dents' significantly higher scores on the 
Metropolitan Readiness Test. In general, 
boys in the attachment group appeared to 
be high-ability students who were well ad- 
justed to school, conformed to the teachers’ 
rules, and "rewarded" the teachers by being 
supportive of them by doing well in their 
school work, helping out, and volunteering 
information. The negative statements re- 
garding student ability and reading progress 
were probably relative rather than abso- 
lute, in view of the more general picture of 
high ability that the teachers drew in de- 
scribing these boys. In the context of their 
total statements about these boys, these 
negative remarks appear to be more a mat- 

ter of concern about getting the boy to 
maximize his potential rather than concern 
about getting him to meet minimal require- 
ments. The comments about clothing sug- 
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gest that these boys were generally of 
higher socioeconomic status than their 
classmates. Teacher favoritism toward at- 
tachment boys is suggested by at least one 
set of findings: even though these boys were 
perceived as busybodies, they were more 
often assigned as leaders or helpers than 
other boys were. 

Teachers described girls assigned to the 
attachment group as larger than average, 
more attractive than average, having inter- 
ested, cooperative parents, more likely to 
have a visual impairment or require glasses, 
more likely to have been to kindergarten, 
more likely to be unable to write their - 
names, more creative and imaginative, 
more alert, better observers, more likely to 
enjoy stories, able to work independently 
on assignments, having higher general intel- 
lectual ability, coming from generally good - 
families, and being high in expected. 
achievement. Just as with the boys, the 
teachers’ statements about high ability in 
the attachment group girls were borne out 
by significantly higher Metropolitan Readi- 
ness Test scores. In addition to these specif- 
ics, the teachers had significantly more pos- 
itive comments and significantly fewer neg- 
ative comments about these girls than 
about other girls. Like the boys, the girls in 
the attachment group appeared to be high- 
achievement students who conformed to 
and rewarded their teachers. Also, there 
again is evidence that they came from 
higher-social-class families and that they 
were perceived in a generally positive way, 
including physical attractiveness. b 

These data generally parallel previous 
findings suggesting that teachers hold a 
general positive halo effect toward children 
assigned to the attachment groups and that, 
even when negative qualities are perceive 
in these students, the teachers do not Te- 
spond to them negatively as they appar- 
ently do in responding to similar traits in 
students in the other three groups. 

Boys assigned to the concern group were 
especially likely to be described as being of 
average size, being reared by grandparents 
or older parents, having a speech impe ri 
ment or using baby talk, being generally 
immature, being active and vivacious, seek- 
ing teacher attention, being able to use an 
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keep up with school supplies, being depend- 
ent in school work and needing help from 
the teacher, needing reassurance and ap- 
proval, being of generally low ability, need- 
ing readiness work, having a positive atti- 
tude toward school, having generally poor 
health, having generally poor social-emo- 
tional development, having generally poor 
oral and verbal skills, having generally 
poor skills in the area of independent work, 
and having generally low abilities. The lat- 
ter perception was confirmed by signifi- 
cantly lower Metropolitan Readiness Test 
scores for these boys, In addition, the 
teachers made significantly fewer positive 
comments and significantly more negative 
comments about these boys, although the 
negative comments were almost completely 
confined to their abilities rather than to 
their personalities or cooperation with the 
teachers. 

Taken together, these data suggest a 
rather clear picture of the boys in the con- 
cern group. They appear to be students of 
low ability who are dependent upon the 
teachers for help and reassurance in com- 
pleting their assignments. However, they 
are also perceived as being cooperative and 
Compliant, so that demands they make 
Upon the teachers are perceived as being 
legitimate. The teachers also perceive them 
as poorly adjusted in their social relation- 
ships and generally immature, again stress- 
Ing their dependency upon and need for 
help from the teachers. 

Teachers describe girls in the concern 
group as being more likely to be nonwhite 
than to be white; they mentioned the par- 
ents’ occupation more frequently (often 
with a negative connotation) ; their parents 
Were described in generally positive terms; 
they tended to be from large families; the 
Size of the family was one of the problems 
that the teachers felt was confronting the 


_ child; the child was more likely to have a 


Speech impediment or use baby talk, to be 
More dependent and quieter, to be lacking 
in self-confidence, to be dependent in work, 
and to need help from the teacher, to have 
a generally positive attitude toward school, 
and to have poorly developed verbal skills. 
Again, these girls had significantly lower 
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Metropolitan Readiness Test scores than 
their classmates. 

Even more than the boys, the girls in the 
concern group showed a pattern of low ` 
achievement combined with dependency 
upon the teacher for both emotional and 
academic support. The data for both boys 
and girls in the concern group in this study 
suggest that concern students are low 
achievers who depend on the teacher to tell 
them what to do continually. This depend- 
ency is expressed in inhibited, socially ap- 
proved ways; concern children apparently 
do not present behavioral problems and are 
not assertive or intrusive like children in 
some of the other groups. 

Teachers describe boys in the indifference 
group as more likely to be blond-haired, to 
have a “blank” facial expression, to be im- 
mature, to be neat and clean, to have a 
working mother (which the teacher per- 
ceives as a problem), to be reared by 
grandparents or substitute parents, to have 
a disinterested or uncooperative parent, to 
have a visual impairment or need glasses, 
to have a speech impediment and/or speak 
in baby talk, to be a sociometrie loner, to 
be anxious to please, to be in poor health, to 
have negative attitudes toward school, to 
have failed to live up to the teachers’ initial 
expectations, and to have poor verbal skills. 
Also, in summative response categories 
about indifference boys, the teachers more 
often mentioned physical descriptions, more 
often commented negatively upon their 
health, more often made negative comments 
about their` work-related behavior, and 
made more total negative comments about 
them. Despite this rather consistently nega- 
tive pattern, which included several nega- 
tive comments about abilities and work 
habits, the boys in the indifference group 
did not differ significantly from other boys 
on the Metropolitan Readiness Test. 

Some of the contrasts between concern 
boys and the indifference boys are striking. 
Both groups were perceived as being of low 
ability (especially the concern boys), but 
the two groups of boys were perceived as 
responding to teachers in contrasting ways. 
Whereas the concern boys showed depend- 
ency upon the teachers, the indifference 
boys apparently did not respond to the 
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teacher in ways that were rewarding to 
them. Even though the teachers perceived 
the needs of these children quite clearly and 
even though they rated them as anxious to 
please, they responded to them with indif- 
ference rather than with concern. Clues in- 
dicating possible reasons for this appear in 
some of the other teacher perceptions. In 
particular, the teachers reported that indif- 
ference boys had a “blank” facial expres- 
sion (suggesting that they did not respond 
to teacher overtures) and that they had 
poor attitudes toward school (the indiffer- 
ence group was the only group to be so 
described). They were also described as 
having failed to meet the teachers’ initial 
expectations for them. Apparently, on the 
basis of their appearance and of observa- 
tions of their early work, the teachers had 
developed moderate to high expectations for 
these boys. Apparently, however, they re- 
sponded inappropriately (from the teachers’ 
point of view, at least) to teacher overtures 
(blank expression, “negative” attitudes), 
thus conditioning the teachers to stay away 
from them. ` 
The latter data are reminiscent of the 
findings of Yarrow, Waxler, and Scott 
(1971), who found that teachers returned 
more quiekly and more frequently to a 
child who had given them a positive re- 
sponse to their previous overture than to a 
child who had failed to give such a positive 
response. Positive responses to teacher 
overtures are apparently experienced as re- 
warding by the teachers, so that children 
can condition teacher approach or avoid- 
ance by either producing or failing to pro- 
duce, respectively, positive responses to 
these overtures. Apparently this phenome- 
non was operating in these classrooms, and 
- jt must have exerted a strong primacy effect 
in conditioning the teachers to avoid and 
become indifferent toward these boys, even 
though they clearly perceived their needs 
for help in certain areas. The teachers ap- 
parently felt that these boys wanted to be 
left alone and/or that they disliked the 
teachers or school, so they began to avoid 
them. 
The girls in the indifference group were 
described as more likely to be nonwhite 
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than white, as not liking school, as easily 
giving up when working on assignments, as 
Jacking in self-confidence, as not being pre- 
pared for school, as not knowing their 
colors or numbers, as being creative and 
imaginative, as being of generally low abil- 
ity, as being children that the teachers did 
not frequently interact with, and as pre- 
senting problems in their classroom behav- 
ior. Teacher perceptions of deficient readi- 
ness skills in these girls were borne out by 
the girls' significantly low Metropolitan 
Readiness Test scores. Here again, the 
teachers clearly perceived that these chil- 
dren were in need of help, but they re- 
sponded with indifference rather than with 
concern. 

Again, the apparent reason is the re- 
sponse of the children to the teacher. 
Whereas the concern group combines low 
ability with dependency and positive re- 
sponse to the teacher, the indifference group 
combines low ability with misbehavior in 
the classroom and negative attitudes to- 
ward school. The major difference between 
the two groups seems to be a negative re- 
sponse to the teacher and/or to school in 
general among the indifference students. 
When teachers encounter such attitudes and 
do not succeed in changing them, they ten 
to respond with indifference of their own. 
Apparently, this is not a benign indifference 
which occurs just because a child is nonsa- 
lient and the teacher is too busy interacting 
with classmates to notice him. Instead, in- 
difference of this sort seems to be a defense 
mechanism to protect the teacher from con- 
tinued frustration and rejection by the in- 
difference-group students. Such unrespon- 
sive and/or sullen behavior by students 
during early interactions with the teachers 
tends to “turn off” the teachers, condition- 
ing them to minimize their interactions with” 
these students in the future and to develop 
an attitude of indifference rather than con^ 
cern about their problems, even thou 
these problems are accurately perceived. 

Boys in the rejection group were 
scribed as more likely to be nonwhite thal 
white, as coming from intact families Y. 
which both parents were living, | 
ture and poorly adjusted, as indepen ent; 
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as loud or disruptive, as being relatively 
inactive and not vivacious, as unlikely to 
be assigned as a leader or helper, as having 
difficulty in getting along with others, as 
being talkative, as not knowing likenesses 
and differenees, as not knowing how to 
write their names, as not knowing left from 
right or how to stay within lines on tablets 
(readiness skills), as unable to use or keep 
up with school supplies, as having poor 
reading abilities, as needing extra help be- 
cause of generally low ability, as being 
likely to fail or to have to be withdrawn 
from school, as having deteriorated in their 
work since the beginning of the year, as 
being either notably healthy or having no- 
tably poor health, as having poor verbal 
skills, as being physically unattractive, as 
presenting classroom behavior problems, as 
generally lacking in school readiness, as 
presenting problems in their behavior dur- 
ing work assignments, and as being of gen- 
erally low ability. Although the teachers 
remembered both more positive and more 
negative contacts with these children, they 
had more total negative comments about 
them, fewer total positive comments about 
them, and more total comments about them 
(positive plus negative). Despite this ex- 
tremely detailed and almost unremittingly 
negative picture of the rejection-group 
boys, including several explicit statements 
to the effect that they were of low ability, 
these boys did not differ significantly from 
other boys in Metropolitan Readiness Test 
Scores, 

Just as teacher reactions to the attach- 
ment-group students suggest a generally 
Positive halo effect, the reactions to the re- 
Jection-group boys suggest a generally neg- 
ative one. The halo effect induced by their 
intense dislike of the rejection-group boys 
even caused the teachers to misjudge (un- 
derestimate) seriously their intellectual 
abilities, even though these teachers gener- 


ally were remarkably accurate in judging’ 


Student potential (Willis, 1972). 

he contrast with students in other 
groups is also instructive. Instead of being 
Quiet and dependent upon the teacher like 
the concern students, the rejection boys are 
assertive and often loud and disruptive; 
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and they present frequent classroom disci- 
pline problems. Their interpersonal prob- 
lems are not confined to their interactions 
with the teachers, as with indifference stu- 
dents; they are also described as having 
difficulty in getting along with their class- 
mates. Thus, the problems that these boys 
present to the teachers are frequent and 
serious enough to cause the teachers to re- 
spond with rejection rather than with con- 
cern. In this connection, it is noteworthy 
that, although both the concern and the re- 
jection boys are described as in need of 
readiness work and extra help in general, 
only the rejection boys are described as 
likely to fail or to have to be withdrawn 
from school. 

This implies that the teachers had posi- 
tive expectations of success in their efforts 
to work with concern students but did not 
have such expectations for whatever efforts 
they may have been making to give help to 
rejection students. Also, it should be kept in 
mind that concern students had generally 
low abilities as measured by the readiness 
tests, while the rejection boys did not differ 
significantly from other boys. Apparently, 
boys who present sufficiently severe disci- 
pline problems create such a strong nega- 
tive halo effect that teachers also attribute 
low abilities and lack of readiness to them, 
despite evidence to the contrary. This is the 
only group in which teacher perceptions of 
ability did not match readiness test data. 
Apparently, the frustration and aggrava- 
tion caused by these boys was sufficient to 
impair teacher judgment in their case. 
Thus, they underestimated these boys’ in- 
tellectual abilities and were pessimistic re- 
garding their chances for academic success, 
even though they were optimistic about the 
academic success of concern-group boys 
who actually did have lower abilities. 

Concerning the girls in the rejection 
group, the teachers frequently mentioned 
negative family patterns (broken home or 
poor parental cooperation), and they de- 
scribed the girls as being busybodies, not 
liking school, giving up easily, lacking self- 
confidence, being playful and mischievous, 
not being prepared for school, not being ad- 
justed to school routines, not knowing 
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colors and numbers, not knowing likenesses 
and differences, being alert and close ob- , 
servers, not volunteering information to the 
class, failing to pay good attention, being 
likely to fail or to have to be withdrawn 
from school, being able to achieve more 
than they were doing, having poor school 
attitudes, having poor school readiness, 
having poor work habits, and having poor 
general ability. Again, the teachers made 
significantly more negative comments, sig- 
nificantly fewer positive comments, and sig- 
nificantly more total comments about the 
rejection girls than about girls in the other 
groups. As with the boys, however, despite 
the teachers’ repeated and detailed com- 
ments about low abilities, the rejection girls 
did not differ significantly from their class- 
mates on Metropolitan Readiness Test 
scores. 

Although it shows up in somewhat differ- 
ent variables, the general pattern concern- 
ing rejection-group girls is quite similar to, 
and has the same implications, as the pat- 
tern for boys. In contrast to other girls, the 
rejection-group girls appeared to present 
more behavioral and disciplinary problems, 
to be underachievers, and in general not to 
“go along with the program.” As with the 
boys, the teachers tended to rate the rejec- 
tion-group girls as having low abilities (al- 
though they sometimes described them as 
underachievers rather than as low-ability 
children), even though they did not differ 
significantly from other girls in readiness 
test scores. The contrast between concern- 
group girls and rejection-group girls paral- 
lels the same contrast for the boys. Concern 
girls have low abilities but rej ection girls do 
not, although teachers think that they do. 
The teachers apparently were rewarded by 
the concern-group girls’ behavior and devel- 
oped concern and the habit of spending 
time with them, but they were apparently 
put off by the behavior of the rejection stu- 
dents and developed an attitude of rejection 
toward them. 


Discussion 


Teacher reactions to the four types of 
students studied in this research are readily 
explainable on the basis of the behavior of 
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the students themselves (as perceived by 
the teachers). The three major variables in- 
volved seem to be the students’ general 
level of school success, the degree to which 
they reward teachers in their personal con- 
tacts with them, and the degree to which 
they conform to classroom rules. Attach- 
ment students were compliant and success- 
ful in school, and they apparently rewarded 
teachers in their interactions with them. 
Concern students had difficulty in school 
but apparently were compliant and person- 
ally rewarding to the teachers, so that 
teachers became concerned about them and 
spent much time providing remedial help. 
The teachers’ negative attitudes toward in- 
difference and rejection children led them to 
underestimate their ability and learning po- 
tential. However, they did not respond with 
concern even when they accurately per- 
ceived the needs of these students, appar- 
ently because they were “turned off" by the 
students’ personalities and behavior. 
The indifference students apparently re- 
sponded negatively to the teachers, failing 
to provide a rewarding interpersonal con- 
tact pattern, so that the teachers became 
indifferent and gradually spent less and less 
time with these children, even though they 
perceived them as needing extra help. The 
rejection students not only failed to provide 
rewarding experiences to the teachers 10 
their interpersonal contacts with them; 
they also frequently caused classroom dis- 
turbances and were general discipline prob- 
lems. The teachers responded to this by re- 
jecting the students to the point of wanting 
to get rid of them and by projecting à nun 
ber of traits onto them, especially low abili 
ties, which they did not, in fact, possess ( 
least not as a group). 

Teacher perceptions were generally accu 
rate for the first three groups, but the chi 
dren of the rejection group apparently W 
sufficiently threatening to the teachers 
impair the accuracy of their perceptio 
and to cause them to project inappropri?. 
and incorrect attributes onto them. his 
tended even to teacher judgments © 
dent ability, which were usually quite 80% 


rate. a 
All in all, the data of this study fit n! 
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with the findings from Feshbach's (1969) 
study of teacher preferences for different 
types of students and from previous studies 
which related teacher assignment of stu- 
dents to one of these four attitude groups to 
teacher-student interaction patterns (Bro- 
phy & Good, 1974; Good & Brophy, 1972; 
Jenkins, 1972; Silberman, 1969). They ex- 
tend this line of research by providing 
more information about the personal quali- 
ties of the students assigned to these four 
groups. 

The data on concern and rejection stu- 
dents are particularly instructive, because 
previous research had found them to be 
quite similar, raising the question of why 
teachers responded to one group with con- 
cern while responding to the other group 
with rejection. The present findings suggest 
that these two groups are in fact different, 
although their differences have not shown 
up clearly in the student behavior measures 
included within previous observational re- 
Search. Previous interaction analysis re- 
search had identified compliance versus 
misbehavior as one apparent difference be- 
tween concern- and rejection-group stu- 
dents, although the findings did not suggest 
anything resembling the extreme difference 
Suggested by the present findings. Perhaps 
actually existing extreme differences were 
hot picked up in previous research for some 
Teason, or perhaps teachers in the present 
Study were exaggerating the actual differ- 
ences as a defense mechanism. Probably 
both factors were operating to some degree. 

The present study has also provided some 
Positive descriptions of the qualities of the 
Indifference-group* students. Previous re- 
Search had identified only low frequency of 
Contact with the teacher as a predictable 
Characteristic of this group. The present re- 
Search has identified several additional 
characteristics which help form an explana- 
tion for the low frequencies of teacher con- 
tact with indifference-group students. The 
findings suggest that a measure of the de- 
Eree to which these students provide teach- 
ers with rewarding responses in their inter- 
Personal contacts with them (as in the 
study by Yarrow, Waxler, & Scott, 1971) 
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would show a significant difference between 

the students in this group and their class- 

mates. 

Across attitude groups, a major conclu- 
sion of this research is that the particular 
relationship between a teacher and an indi- 
vidual student is crucial in affecting the 
teacher's attitudes toward that student, in- 
dependent of such general student charac- 
teristics as achievement, race, sex, ete. It 
appears that children who do not reward 
teachers are avoided and/or rejected by 
them, Also, although some relationships do 
exist, the attitudes of teachers are for the 
most part independent of student achieve- 
ment. Expectations are quite closely tied to 
student achievement, but attitudes appear 
to be more closely related to the personal 
qualities of the student and to his reaction 
to the teacher. Thus, a high achiever is not 
necessarily going to be liked nor is a low 
achiever necessarily going to be rejected. 
Depending upon the student’s response to 
the teacher, a high achiever can just as 
easily be treated with indifference, and a 
low achiever can easily become the object 
of teacher concern rather than teacher re- 
jection. 
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The present study is focused on the dif- 
ferential effects of immediate- versus de- 
layed-reward instructions on the creative 
thinking of two economic levels of elemen- 
tary school children. Ward, Kogan, and 
Pankove (1972) did not find any difference 
in the fluency dimension of creative think- 
ing when using immediate and delayed re- 
wards. However, they suggested that there 
may be a difference in using immediate and 
delayed rewards when the criteria of creativ- 
ity are extended to include originality, flexi- 
bility, or elaboration. Furthermore, since it 
would be almost impossible to reward the 
subject after each original, flexible, and elab- 
orate response, it was more practical to 
manipulate reward instructions than to re- 
ward the subject after each creative re- 
sponse. 


1 This article is based on a doctoral dissertation 
submitted to the University of Georgia in 1973. 
The author is grateful to Paul Torrance, his major 
professor, for his guidance and assistance. 

This article has been accepted for Southeastern 
Psychological Association, 1974. 

? Requests for reprints should be sent to Roger 
A. Johnson, School of Education, Old Dominion 
University, Norfolk, Virginia 23508. 


Another gap in knowledge lay in the dif- 
ferential effectiveness of reward conditions 
on economie class with creative thinking 85 
the dependent variable. Although no studies 
have been done on the relationship between 
reward, scores on & creativity test, and eco- 
nomic status, several studies have been pub- 
lished on delayed gratification as it relates 
to different socioeconomic classes. P 
Investigators are divided on whether mid- 
dle-class subjects make more delayed-re-; 
ward selections than lower-class subjects. 
Studies by Levy (1968) , Maitland (1966), 
and Walls and Smith (1970) indicated at 
middle-class subjects make more delayed- 
reward selections than lower-class subjects: 
On the other hand, Shipe and Lazare (1969 
failed to find any significant differences M 
the ability of different socioeconomic closs 
to delay gratification. 
The present study explored whether E 
were differences in the way members 01 05. 
ferent economie classes responded tog 
wards, with creative thinking aS the © 
pendent variable. The questions, Whetw 
reward conditions interact with the child; 
grade level and whether any differences 
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n the Figural Form A of the Torrance Tests 
f Creative Thinking (TTCT; Torrance, 
: ) between economie classes and/or 
- grade levels, were also investigated. 


V. METHOD 


Design and Statistics 


The three-group, randomized subjects, posttest- 
only design was used (Campbell & Stanley, 1966). 
Subjects in Grades 3, 4, and 5 were assigned to one 
of two experimental groups or to a control group 
by reference to a table of random numbers. 

The immediate-reward group was told that if 
they worked hard on the game, they would each 
receive six prizes following the game. The delayed- 
reward group was given the same instructional set 
except that they were told they would have to wait 
one week to receive the prizes. After the students 
understood the instructions, the test was admin- 
istered according to the standard instructional set 
as found in the Directions Manual and Scoring 
Guide for the TTCT Booklet A (Torrance, 1966). 
The control group was administered the test under 
identical conditions with no mention of reward. 

Scores on Fluency, Flexibility, Originality, and 
Elaboration of the TTCT were the dependent 
variables. The Fluency score represents the total 
number of relevant responses produced ; Flexibility, 
the number of different, approaches or categories 
found in the responses; Originality, the unusual- 
ness and creative strength of the responses; and 
Elaboration, the number of different ideas produced 
to elaborate responses. The reliability and validity 
of the TTCT is extensive and satisfactory (Tor- 
rance, 1966, 1972). The data were analyzed using 
unweighted means analysis and a 3 X 3 X 2 anal- 
ysis of variance design (Glass & Stanley, 1970). 
Posttest scores were subjected to an analysis of 
Variance to estimate main and interaction effects. 
The predetermined level of significance was .05. 
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Subjects and Data Collection 


One-hundred and forty-five children (84 whites 
and 61 blacks) who attended Grades 3, 4, and 5 
at Comer Elementary School in rural Georgia 
served as subjects. Fifty-two subjects whose family 
income qualified them for a free lunch were classi- 
fied as disadvantaged. Ninety-three subjects not 
qualified for a free lunch were classified as rela- 
tively advantaged. Slightly more blacks than whites 
were classified as disadvantaged. 

Subjects within each grade level were divided 
into two groups, namely, disadvantaged and rela- 
tively advantaged, with each subgroup further 
divided at random into three treatment levels 
(immediate reward, delayed reward, no reward). 

Because of the test author's recommendation 
that disadvantaged children react better when 
tested in small groups and because it would be 
easier to assist subjects in small groups to record 
picture titles, the researcher tested each of the two 
economie subgroups within the treatment levels 
in turn. Six white, trained graduate student assist- 
ants were assigned at random to administer the 
test to each of the six tested groups to control for 
the extraneous tester variable. 


RrsuLTS AND Discussion 


The means and standard deviations for 
Fluency, Flexibility, Originality, and Elab- 
oration are shown in Table 1. The fifth- 
grade subjects had the highest means on all 
four dimensions of the TTCT. The fourth- 
and fifth-grade subjects had approximately 
the same means for Flexibility and Origi- 
nality; the third and fourth graders had es- 
sentially the same scores on the Fluency 
dimension. Elaboration scores increased 
steadily from the third through the fifth 


TABLE 1 


Mrans AND STANDARD DEVIATIONS For MAIN Errect Sussets: FLUENCY, FLEXIBILITY, ORIGINALITY, 
AND ELABORATION 


Fluency Flexibility Originality Elaboration 
Higa 2 E SD x SD x SD x SD 
Grade level 
Thi 56 20.43 | 4.15 | 13.42 | 2.83 | 20.92 | 4.75 | 34.60 | 8.06 
Ee 41 20.91 | 4.70 | 15.44 | 3.25 | 25.39 | 7.35 | 39.02 | 14.60 
Fifth grade 48 22.65 | 4.55 | 15.94 | 2.23 | 25.71 | 6.42 | 41.04 | 5.75 
Reward condition 
T i 51 23.06 | 3.99 | 15.55 | 2.30 | 25.22 | 3.80 | 43.87 | 6.00 
E 44 23.10 | 3.83 | 16.76 | 2.02 | 28.27 | 4.34 | 38.69 | 12.28 
No reward 50 17.83 | 3.28 | 12.49 | 2.63 | 18.54 | 6.52 | 32.10 | 8.18 
onomic status 
Di 52 22.08 | 5.74 | 14.65 | 3.59 | 22.51 | 7.95 | 37.13 | 11.95 
Ua un 20.58 | 2.31 | 15.21 | 2.10 | 25.51 | 4.02 | 39.31 | 8.08 


Relatively advantaged 93 
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grade. Subjects in the no-reward condition 
had the lowest scores on all four dimensions 
oi the TTCT. There was no consistent dif- 
ference between means for the immediate- 
and delayed-reward groups. The disadvan- 
taged subjects performed slightly better on 
Fluency, while the advantaged subjects had 
a slight edge on Flexibility, Originality, and 
Elaboration. 

All four dimensions of creative thinking 
were subjected to an analysis of variance to 
determine if the observed differences in 
means were significant and to examine pos- 
sible interaction effects. On all four dimen- 
sions of Figural Form A of the TTCT, there 
was a main effect for reward condition, in- 
dicating that one or more of the reward con- 
ditions were different from each other 
(Fluency, F = 7.89, df = 2/127, p < .01; 
Flexibility, F = 7.98, df = 2/127,p < 01; 
Originality, F = 8.08, df = 2/127, p < .01; 
Elaboration, F = 3.88, df = 2/127, p < 05). 
A Newman-Keuls multiple-comparison test 
indicates that the immediate-and delayed- 
reward conditions were not significantly dif- 
ferent from each other. Subjects in the no- 
reward condition scored significantly lower 
than subjects in the other two reward con- 
ditions. There were no significant differences 
in the main effects for grade level and eco- 
nomic status. 

The only significant interaction was re- 
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Ficure 1. Interaction of reward condition and 
economic status for Fluency. 
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ward condition with economic status, which 
is depicted for Fluency in Figure 1. The 
performance of disadvantaged subjects was 
significantly higher under both reward con- 
ditions, while the performance of the rel- 
atively advantaged subjects was slightly 
higher in the no-reward condition. The inter- 
actions for Flexibility, Originality, and 
Elaboration were also significant and quite 
similar to the graph depicted in Figure 1 
(Fluency, F = 7.43, df = 2/127, p < 0l; 
Flexibility, F = 3.68, df = 2/127, p < .05; 
Originality, F = 3.92, df = 2/127, p < .05; 
Elaboration, F = 4.39, df = 2/127, p < .05). 

The intercorrelations of the four TTCT 
scores were as follows: Fluency with Flexi- 
bility was .72; Fluency with Originality was 
.65; Fluency with Elaboration was 25; 
Originality with Elaboration was .25; Orig- 
inality with Flexibility was 31; and Flexi- 
bility with Elaboration was .38. All inter- 
correlations of the four TTCT scores 
reached significance (df = 143, p < 01). 

It is an accepted postulate of social learn- 
ing theory that behavior potential is a func- 
tion of the reinforcements that the subject 
expects to receive as & result of engaging in 
that behavior. In the present study, receiv- 
ing prizes for “working hard" on a creativity 
test appealed mostly to the disadvantaged 
subject. Apparently, the consequences 0! 
working hard (prizes) were viewed as being 
worth the effort only by the disadvantaged. 
This might be attributable to less access to 
such prizes and, hence, less satiation. 

The fact that reward instructions were 
differentially effective with different ec0-§. 
nomic classes has an important education 
implication. If the economic status of the 
subject is an integral part of the moti- 
vational techniques that can be used success” 
fully, then it is important to determine & 
actly which reinforcers are most effective 
with each economic class. Once this has bee? 
determined, school programs can : 
veloped in accordance with the unique rem 
forcement patterns of the subjects ent? led. 
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The effects of behavioral objecti 
process were investigated using a co. 
130 college subjects were administe 
assigned to an example-only, 

or an objective-rule-example trea 
meet a minimum criterion perform: 
ced the number o 
task and in: 


but significantly 
number of examples. Objectives an 


the requirement for reasoning ability. 


It seems that even though educational 
psychologists (Bloom, 1956; Bobbit, 1924; 
Tyler, 1951) had been stressing the need for 
precise statements of instructional objectives 
for many years, it was not until Mager 
(1961) published his book on preparing ob- 
jectives that the educational community 
started to take instructional objectives seri- 
ously. Since Mager’s book, many people 
have mounted the bandwagon and filled the 
literature with articles extolling the virtues 
of instructional objectives. However, there 
are those (Ebel, 1967; Hisner, 1967a; Jack- 
son & Belford, 1965; Kliebard, 1968) who 
question the value of objectives and feel 
they might actually be a hindrance to the 
design of instruction. After an interchange of 
views in the literature, Eisner (1967b) re- 
sponded to his critics by pointing out that 
the contribution of educational objectives to 
curriculum construction, teaching, and 
learning is an empirical problem, while most 
articles that have been written are merely 


1 This research was supported by the Advanced 
Research Projects Agency of the Department of 
Defense and was monitored by the Office of Naval 
Research under Contract N00014-67-A-0126. 
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LABILITY OF OBJECTIVES AND/OR 
RULES ON THE LEARNING PROCESS 
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the little research that has bee 
best inconclusive. In a recent review of cur- 
rent research on the effects of presenting be- 
havioral objectives to students, Duchastel 
and Merrill (1973) suggest that future Te- 
search should focus on interactions between 
the availability of objectives and both task 
characteristics and individual differences. . 

The purpose of this study was to investi- 
gate what effects the presentation of behavi- 
oral objectives would have on the learning 
process. Specifically, this study was con- 
ducted to further clarify (a) how the presen- 
tation of objectives would affect subjects 
performance on criteria measures; (b) how 
other task characteristics would interac 
with the presentation of objectives; and (c) 
how individual aptitudes would intera 
with the presentation of objectives. 

It was hypothesized that objectives woul 
serve as orjenting stimuli which dispose © 
student to attend to, process, and organizó 
relevant aspects of displayed informati 
in accordance with the stated objective 
Therefore, the presentation of objec 
was expected to reduce the number of exa 
ples and the amount of time require 
learn the task, . facilitate performance i 
transfer-retrieval criterion measures, 
reduce the requirements for memory 
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reasoning abilities. However, it was expected 
that these effects would be tempered by or 
interact with other properties of the learn- 
ing task. If objectives are inserted in a task 
that has minimal orienting or organizing 
stimuli, then the above hypothesized effects 
should be very evident. On the other hand, 
if objectives are inserted in a task which 
has other effective orienting stimuli such as 
rules, then the objectives would be somewhat 
redundant and have a more subtle effect, 


METHOD 


Subjects 


The 160 subjects who participated in this study 
were taken from four sections of an introductory 
educational psychology course and three sections 
of a science education course at the University of 
Texas at Austin. All subjects were required to 
participate as a class assignment. However, 30 of 
the original subjects were eliminated because they 
failed to complete all three phases of the study. 


Ability Measures 


A battery of six cognitive ability tests was ad- 
ministered to all subjects. The battery consisted 
of three tests selected from the Kit of Reference 
Tests for Cognitive Factors (French, Ekstrom, & 
Price, 1963) and three task-relevant tests developed 
for this study. The task-relevant tests required the 
subjects to process the same type of information 
that must be processed in the learning task, while 
the published tests required similar processes on 
information not related to the task. À list of the 
individual tests and their factor designations ap- 
pears in Table 4, 


Experimental Tasks and Materials 


The learning task used in this study consisted 
of a hierarchical imaginary science called the Sci- 
ence of Xenograde Systems. The structure and 
Content of the task were similar to those of formal 
Science topics, but the imaginary nature of the 
Science assured that none of the subjects had any 
Previous experience with the task. In the version 
of the science used for this study, a Xenograde sys- 
tem consists of a nucleus with an orbiting satellite. 
The satellite is composed of small particles called 
alphons which also may reside in the nucleus. The 
Subject matter of the science deals with the prin- 
ciples or rules by which the activity of satellite and 
alphons may be predicted, The terminal objective 
of the task requires that subjects predict and re- 
cord the state of the alphons and satellite of a 
Xenograde system at successive time intervals 
Siven the initial state of the system at time zero. 

he instructional program consisted of 10 mod- 
ules. The materials for each module included (a) 
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a statement of a subobjective, (b) a statement of 
a rule, (c) five examples of the rule, 
five short, constructed response tests. The examples 
were in the form of partial Xenograde tables which 
showed the activity and relationships of a Xeno- 
grade system at several points in time, 

The instructional program was written in the 
Coursewriter II language and presented to the 
Subjects by the IBM 1500/1800 computer-assisted 
instruction system. 


Procedure 


After the administration of the cognitive ability 
test battery, the subjects were randomly assigned 
to one of four groups: an example-only group 
(n — 32), an objective-example group (n = 33), a 
rule-example group (n = 32), or an objective-rule- 
example group (n — 33). 

In learning the science, subjects in the example- 
only group received an example of the first rule of 
the science displayed on a cathode ray tube. After 
studying the example, each subject responded to a 
three-item, constructed response test which re- 
quired him to predict certain values using the rule 
inferred from the example. If the subject responded 
correctly to two out of the three test items, he was 
given an example of the next rule in the sequence. 
Otherwise, he was given another example of the 
same rule followed by another three-item test, 
This sequence of new examples followed by a test 
continued until the subject responded correctly to 
two of the three test items or received five ex- 
amples. The task was completed after all 10 rules 
of the science were learned to the required cri- 
terion. A posttest. was administered immediately 
following completion of the learning task, and 
retention and transfer tests were administered two 
weeks later. 

The subjects in the other three groups learned 
the science by the same basic procedure except for 
the following treatment differences. The objective- 
example group was shown a statement of a sub- 
objective on an image projector while the corre- 
sponding example was displayed on a cathode ray 
tube. The rule-example group was shown a state- 
ment of the rule corresponding to each example, 
and the objective-rule-example group received 
both the objective and the rule in addition to the 


example. 
RESULTS 


In addition to total scores on the six cog- 
nitive ability tests, posttest, retention test, 
and transfer test mentioned in the previous 
section, data were obtained for each subject 
on the following criteria: total number of 
examples required to learn the science, dis- 
play latency, test-item response latency, and 
total latency. Display latency was the total 
time the subject spent studying the examples 
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TABLE 1 
Group MEANS AND STANDARD DEVIATIONS OF THE 
NuMBER or ExAMPLES REQUIRED TO LEARN 


THE TASK 


Objectives. 


No 
Yes 


and, depending upon the subject’s treatment 
group, the corresponding rules and/or ob- 
jectives. Test-item response latency was the 
total time required by the subject to respond 
to the three-item tests following each exam- 
ple display. Total latency was merely the 
sum of the display and the test-item response 
latencies. 

The descriptive statistics and reliabilities 
of the ability tests, posttest, retention test, 
and transfer test are reported elsewhere 
(Merrill, 1970). 

The group means and standard deviations 
for the number of examples required to learn 
the task are presented in Table 1. The re- 
sults from a two-factor analysis of variance 
revealed a significant rule effect (F = 48.7, 
df = 1/126, p < .001) wherein the presenta- 
tion of rules reduced the number of examples 
required to learn the task. A significant ob- 
jective effect (F = 4.7, df = 1/126, p < .05) 
shows that the presentation of objectives 
also reduced the number of examples re- 
quired, but this reduction was not nearly as 
marked as the reduction caused by the pre- 

sentation of the rules. An examination of 
the group frequency distributions with num- 
ber of examples as criterion revealed that 
73% of the subjects in the objective-rule- 
example group, 62% of the subjects in the 
rule-example group, and 21% of the sub- 
jects in the objective-example group learned 
the task in a minimum number (10) of tri- 
als. In contrast, only 3% of the subjects in 
the example-only group learned the task in 
10 trials. 

The means and standard deviations for 
each group on the three latency measures 
may be found in Table 2. The latency meas- 
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ures were also analyzed using & two-factor 
analysis of variance. A significant rule effect 
was obtained on all three measures (F = 
21.9, df = 1/123, p < 001, for display la- 
tency; F = 48.8, df = 1/126, p < .001, for 
test-item response latency ; and F = 39.2, df 
= 1/123, p < .001, for total latency), with 
the rule groups taking considerably less time 
to study the displays and to respond to the 
criterion items. The objective effect was sig- 
nificant (F = 12.8, df = 1/126, p < .001) 
only on test-item response latency, with the 
objective groups requiring less time to re- 
spond to the test items than nonobjective 
groups. There also was a significant inter- 
action (F = 4.2, df = 1/126, p < .05), with 
test-item response latency as criterion. This 
interaction indicates that the objectives had 
a greater effect in reducing response latency 
when added to a task which had no other 
focusing or organizing stimuli than they did 
when added to a task which had other ef- 
fective orienting stimuli such as rules. In 
other words, the difference in response 
latency between the example-only and ob- 
jective-example groups was greater than 
the corresponding difference between the 
rule-example and objective-rule-example 
groups. 

Since the experimental procedure required 
all subjects to perform at a minimum ¢ri- 
terion level on each rule before proceeding to 
the next rule, no group mean differences were 
expected on the posttest. The confirmation 


TABLE 2 
Group MEANS AND STANDARD DxviATIONS FOR 
DISPLAY LATENCY, TEST-ITEM RESPONSE LA- 
TENCY, AND ToTAL LATENCY IN SECONDS 


Latency 
eb Display (Reston ‘Total 
u | sp|\ mu | sp) M sD 
red Di | — — EE 
Example- 3 

only g1.21379.8|923.3 430.7 1772-8, m4 
Objective- ln 
example 1365 81377 .5649.3 253.9 1513.5 e 
Rule-example [543.9/208.6 493.7 211.7 1037.6399 
Objective- l 
rule-ex- E a 

ample 634..31218.2/419 .8|128.2)1053 55° 
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TABLE 3 
GnouP Means AND STANDARD DEVIATIONS FOR 
PosrrEsT, RETENTION TEST, AND TRANSFER 
Test Scores 


Group WE 
M|SD|M|SD|MI|SD 
Example-only 45.5| 4.944.2] 7.2]11.0| 2.8 
Objective-example — |44.3/11.7/43.3/13.9/13.2| 4.4 
Rule-example 45.1/12.9.43.6/14.2]14.1| 5.9 
Objective-rule-ex- 
ample 47.8/12.2446.2112.2|14.7| 5.0 


of this expectation made it possible to attri- 
bute any group differences on retention or 
transfer to the differential treatments rather 
than to differential posttest performance. 
The posttest, retention test, and transfer test 
means and standard deviations may be 
found in Table 3. Even though the rule 
groups received significantly fewer examples 
and took significantly less time to learn the 
task, their performance on the transfer test 
was significantly higher than that of the 
no-rule groups (F — 7.8, df — 1/126, p « 
01). The objective effect did not reach sig- 
nificance at an acceptable level, but it did 
approach significance (F = 3.1, df = 1/126, 
D < .10), with the objective groups obtain- 
ing higher mean transfer scores than the no- 
objective groups. However, there were no 
Significant group mean differences on the 
retention test. 

The battery of cognitive ability tests was 
factor analyzed, but consistent with previous 
findings (Bunderson, Olivier, & Merrill, 
1971), it was not possible to separate the 
factors of induction and general reasoning. 
Therefore, a two-factor varimax solution 
Which yielded the factors of reasoning and 
associative memory is presented in Table 4. 
The reasoning factor is marked by the two 
induction and the two general reasoning 
tests. 

Regression analyses of the individual 
ability tests scores, factor scores, and the 
criterion measures were conducted. A sig- 
nificant Ability x Treatment interaction (F 
= 3.16, df = 3/122, p < .05) was obtained 
Using test-item response latency as the cri- 

rion measure and reasoning factor scores 
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as the covariable. Test-item response la- 
tency had a high negative relationship to 
reasoning, as defined by the reasoning factor 
scores, for subjects in the example-only 
group. However, the corresponding relation- 
ship between reasoning factor scores and 
test-item response latency was significantly 
reduced (F = 9.28, df = 1/124, p < .01) for 
subjects in the other three treatments. 
Similar ability by treatment interactions 
were obtained using individual reasoning 
test scores as covariables with test-item re- 
sponse and total latency as criterion. 


Discussion 


The design of the present study was such 
that all subjects were required to reach a 
minimum criterion performance at each 
level of the task before they were allowed to 
go on to the next level. This procedure was 
used to assure that all treatment groups 
would perform at the same level on the 
posttest. Unless all groups learned the origi- 
nal task equally well, differential perform- 
ance on retention or transfer measure could 
not be interpreted in terms of the organiza- 
tion or structure provided by the instruc- 
tional treatments. The results confirmed the 
expectation of nonsignificant group dif- 
ferences on posttest performance. 

Since there was a negligible decrement in 
performance between the posttest and re- 
tention tests for all treatment groups, the 
retention interval of two weeks may have 
been too short for the treatments to have 
had an effect on retention. However, con- 
trary to the learning by discovery hypothe- 
sis (Bruner, 1961), the presentation of rules 


TABLE 4 
Varimax Rotation Factor Matrix 
Factor loadings 
Sai i Associative 
Reasoning 

factor EEE 

Memory of Number Series .1877 .8336 
First and Last Names Test .0078 .8465 
Bi-Column Number Series .6001 .0802 
Letter Sets - 7006 .1954 
Tote Mobile 7458 1607 
.8191 | —.1103 


Ship Destination 
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facilitated performance on the transfer task. 
Even though the rule groups received signifi- 
cantly fewer examples and took significantly 
less time to learn the task, their performance 
on the transfer test was significantly higher 
than that of the no-rule groups. It seems 
that precisely stated rules have a greater 
effect on transfer retrieval than objectives. 
The weak objective effect may have been 
due to the fact that the objectives-only 
specified that transfer retrieval would 
required to solve new problems using previ- 
ously demonstrated relationships. 

‘An examination of the group frequency 
distributions with number of examples as 
criterion showed that the presentation of 
rules enables most subjects to learn the 
science in a minimum number (10) of trials 

and, therefore, with nearly zero errors. Ob- 
jectives had a similar but less pronounced 
effect. Since the rule treatments brought 
such a high percentage of subjects to perfect 
performance in terms of the number of ex- 
amples required, the full impact of these 
treatments, using number of examples as 
criterion, was indeterminate. However, the 
within-group variance was not similarly 
restricted in the latency criterion measures. 

The hypothesis that the presentation of 
rules would reduce the amount of time re- 
quired to learn the task was supported by 
significant rule effects on all three latency 
measures. The presentation of objectives 
did not have the hypothesized effect of re- 
ducing the total time required to complete 
the task. This result would seem to contra- 
dict the argument that objectives have a 
focusing effect if it were not for the re- 
duction in the number of examples required 
by the objective treatments. A comparison of 
the component latency measures, display 
latency and test-item response latency, re- 
vealed that objectives either increased or 
had no effect on display latency but signifi- 
cantly reduced test-item response latency. 
Apparently, the presentation of objectives 
affected the efficiency and effectiveness of 
the subject’s information processing and 
thereby facilitated his performance on the 
criterion test items. 

The presentation of objectives and/or 
rules did significantly reduce the relation- 
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ship between reasoning factor scores and 
test-item response latency. Why the treat- 
ments interacted with reasoning abilities us- 
ing test-item response latency and total 
latency as criteria and did not interact sig- 
nificantly with display latency and number 
of examples as criteria is not clear. Ap- 
parently, reasoning abilities are more crucial 
during those stages of the task where sub- 
jects respond to the criterion test items. 

The hypothesis that objective effects 
would be greater between the example-only 
and objective groups than between the rule 
and rule-objective groups was only sup- 
ported by the significant interaction found 
with test-item response latency as eriterion. 
However, an examination of the means for 
the other criteria shows that the correspond- 
ing differences between the means are con- 
sistent with the hypothesis. Thus, it is im- 
possible to make broad or general statements 
about the effect of objectives on the learning 
process without taking into account the 
other stimulus properties of the task. 

On the basis of the results of this study, it 
was concluded that objectives have orient- 
ing and organizing effects which dispose 
students to attend to and organize relevant 
information and thus facilitate performance 
on criterion-test items constructed in ac- 
cordance with the objectives. However, these 
effects are not as pronounced when the learn- 
ing task contains other orienting stimuli 
such as rules. 
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INFLUENCE OF MODE OF PRESENTATION, ETHNICITY, AND 


SOCIAL CLASS ON TEACHERS' 
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Dalhousie University, Nova Scotia 


lower-class Anglo, Black, and 
students on a set 


ers’ classroom evaluative criteria. A 3 X 3 X 


for repeated 
Patterns in the 


data showed that Anglos were 


Chicano students, teachers evaluated 
of 15 semantic-differential scales dealing with teach- 


2 analysis of variance 


measures revealed significant F tests for all 15 scales. 


rated most favorably. 


Middle-class Anglos and Blacks were rated more favorably than lower- 


class Anglos and Blacks, respectively ; 


however, class made no differ- 


ence in the ratings of Chicanos. More cues were available in the audio 


mode of presentation 


than in the visual mode. 


— — HÀ 


The aim of this study was to investigate 
the possible influence of ethnie and social 
class stereotyping on teachers’ judgments of 
students as well as the way in which these 
stereotypes might be transmitted. Rosenthal 
and Jacobson (1968) touched off a great 
deal of research on expectancy effects with 
the publication of their controversial book, 
Pygmalion in the Classroom. In their re- 
search, Rosenthal and Jacobson manipu- 
lated teachers’ expectations by providing 
teachers with information concerning stu- 
dents’ capabilities for academic achieve- 
ment. However, in many classrooms, initial 
expectations for students’ classroom behav- 
ior may result from social stereotypes. If 
stereotypes do influence teachers’ expecta- 
tions for student achievement and class- 
room behavior, then it follows that teachers 
will devalue Black and Chicano students in 
line with their ethnic stereotypes. A number 
of studies offer support for this idea 
(Whitehead & Miller, 1972; Williams, 


The project reported herein was performed 
pursuant to a grant from the National Institute 
of Education, U.S. Department of Health, Educa- 
tion, and Welfare (NIE Grant NE-G-00-3-0039) . 
However, the opinions expressed herein do not 
necessarily reflect the position or policy of the 
National Institute of Education. No official 
endorsement by the National Institute of Educa- 
tion should be inferred. 

? Requests for reprints should be sent to Mary 
Jensen, School of Physical Education, Dalhousie 
University, Halifax, Nova Scotia, Canada. 
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EVALUATIONS OF STUDENTS 
LAWRENCE B. ROSENFELD 
University of New Mexico | 
After seeing, hearing, or seeing and hearing videotapes of middle- and 
i 
for stereotyping, < 


Whitehead, & Miller, 1971; Woodworth & 
Salzer, 1971). Social class, like ethnicity, 
may also serve as à basis 

and numerous investigations have docu- 
mented the negative stereotypes which por- 
tray lower-class students (Becker, 1952; 
Miller, 1973; Rosen, 1969; Sexton, 1961; 
Sewell, Haller, & Strauss, 1969). 

Ethnie and social class cues are trans- 
mitted both visually and vocally. Secord 
(1958), Clifford and Walster (1973), and 
Williams, Whitehead, and Miller (1971) 
offer evidence supporting the idea that 
ethnic and social class cues are transmitted 
visually. Buck (1968), Anisfeld, Bogo, and 
Lambert (1962), Naremore (1971), and 
Williams and Naremore (1971) provide 
evidence that ethnic and social class cues 
are transmitted paralinguistically, that 18, 
through the nonverbal, vocal properties of 
speech. The aim of the this study was to 
investigate the effects of mode of presenta 
tion, students’ ethnicity, and students’ social — 
class upon teachers’ judgments of students: 


METHOD 


Teachers were placed into groups based 
upon mode of stimulus presentation (i.e 
audio, visual, or audio-visual) and VO 
then exposed to videotapes of students from 
different ethnic and social class bad J 
grounds. After exposure to the sti E 
tapes, teachers rated each student OP l 
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semantic-differential scales dealing with 
classroom evaluative criteria. 


Subjects 


Twelve junior high and 4 senior high schools 
were randomly selected from a list of schools in a 
large (population = 300,000), southwestern city 
public school system. One hundred and sixty-eight 
teachers from these schools either volunteered or 
were chosen by their principals to participate in 
the study since school administrators would not 
allow a random selection of teacher subjects to be 
drawn, Experimental conditions were randomly 
assigned to schools. 

In order to have equal ns for the analysis, the 
responses of 12 subjects were randomly discarded 
from two groups leaving a total of 156 subjects or 
52 subjects per mode of presentation group. Forty- 
nine of the subjects were male; 107 were female. 
One hundred and thirty-one of the teachers were 
Anglo, 13 were Chicano, 6 listed “other” as their 
ethnic membership (excluding Anglo, Black, 


Chicano, and Indian), and 6 did not respond to 
the question. The teachers’ average age was in 
the response category of 30-39 years and the 
average number of years of teaching experience 


was 9.08. Eighty-five of the teachers held at least 
a bachelor's degree; 66 held at least a master’s 
degree; 3 had no college degree; and 2 had PhDs, 


Measuring Instruments 


During May 1972, a mail questionnaire was sent 
Out to a randomly selected group of publie school 
teachers, Thirty-seven per cent of the teachers 
responded, giving lists of the evaluative criteria 
they used to judge students in the classroom. 
These lists contained both social and academic 
criteria, Since most of the concepts were highly 
evaluative in nature, the semantic differential was 
Selected as the questionnaire format to be used in 
this study (Darnell, 1970). 

Concepts for the 15 semantic-differential scales 
Were chosen from lists generated by public school 
teachers. The most frequently occurring concepts 
Were submitted to a separate group of public 
School teachers who judged them for their 
relevance to the classroom. The 15 concepts which 
Tesulted from this process were as follows: par- 
ticipates in class, has a good attitude, exerts a great 
deal of effort, attends regularly, performs well on 
tests, is highly motivated, is cooperative, works 
Well independently, is very intelligent, follows 
directions, is responsibile, is courteous, is very 
Creative, has a good self-concept, and is neat. For 
the final instrument, six different forms of these 
15 concepts were prepared (one form for each of 
SIX students appearing on each videotape). On each 
form, both the polarity and the order to the scales 
Were randomized to reduce order effects. Finally, 

€ order of the forms themselves was randomized. 

cause of the evaluative nature of the con- 
cepts generated, it was believed that social 
desirability might influence teachers’ responses to 
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the students. As a check for this possibility, the 
Crowne-Marlowe (1964) Social Desirability Scale 
was included as an additional dependent measure 
in the study. 


Stimulus Materials 


The videotapes used in the study were copied 
from those used by Williams in previous studies 
(Williams, Whitehead, & Miller, 1971). Fifth- and 
sixth-grade boys representing Anglo, Black, and 
Chicano ethnic groups and middle and lower social 
classes were filmed in interview situations in which 
the boys were asked to discuss their favorite televi- 
sion shows and games. Each of the ethnic by 
social class conditions were represented on each 
of the three tapes used in the study. Thus, 18 dif- 
ferent boys (6 on each tape) served as stimuli, 
Each boy was individually interviewed by an 
Anglo female in her mid-twenties and the tapes 
contained edited portions of these interviews which 
were approximately two minutes in duration, All 
of the boys were neatly dressed, most of them in 
slacks and sport shirts. All of the boys were 
selected from schools in or near Austin, Texas; 
therefore, their speech reflected regional variations 
typical of that area. 


RESULTS 


A 3 X 3 x 2 design (Mode of Presenta- 
tion X Ethnicity x Social Class) was ana- 
lyzed for each separate semantic differ- 
ential using an analysis of variance for 
repeated measures (Games, 1972). A prob- 
ability level of .01 was selected as the basis 
for determining significant differences in 
the analysis of variance and in the subse- 
quent Newman-Keuls multiple-comparison 
procedure which was used as the follow-up 
to pinpoint the specific cells involved in the 
effects found. Table 1 gives the results of the 
analyses of variance. 

A visual inspection of the graphed cell 
means revealed consistencies in the data for 
each of the interactions obtained. The dis- 
cussion which follows is based upon pat- 
terns which were consistent across scales 
for each of the significant effects obtained. 
For a complete analysis of the findings, in- 
cluding the F tables and the results of the 
Newman-Keuls procedures, see Jensen 


(1973). 


Main Effects 


Two main effects for ethnicity were ob- 
tained (Figure 1). The graphed data indi- 
cated that Anglo students were evaluated 
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TABLE 1 
RESULTS de> 
Double interaction Main effect 


‘Triple interaction 


pe eT———————— 
Ethnicity 
Scale 7: Cooperation 
Scale 12; Courtesy 


Mode X Ethnicity 
Scale 3: Effort 
Scale 4: Attendance 
Scale 13: Creativity 
Mode X Class 
Seale 8: Works Independently 


Mode X Ethnicity X Class 
Scale 1: Participation 
Scale 2: Attitude 
Scale 5: Test Performance 


Class 
Scale 6: Motivation Scale 3: Effort 


Scale 9: Intelligence 
Scale 14: Self-Concept 
Scale 15: Neatness 


Nole. All results are significant (p < 01). 


more favorably than Black students and 
that Black students were evaluated more 
favorably than Chicanos (note that 1 = 
most favorable rating; 7 = least favorable 
rating). Comparison of cell means indicates 
that for both scales, Anglos were rated sig- 
nificantly higher than Chicanos, and on 
Scale 7: Cooperation, Anglos were also 
rated significantly higher than Blacks. 


2.0 
22 SN ma i 
24 24 
—— 
^ B c A [] 
Scale 7: Cooperation Scale 12: Courtesy 


FiavnE 1. Main effects for ethnicity. (Abbrevia- 
tions: A = Anglo, B = Black, and C — Chicano.) 
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A main effect for social class was found 
on Scale 3: Effort (Figure 2). Middle-class 
students were rated significantly higher 
than lower-class students on Effort. 


Double Interactions 


Mode x Ethnicity interactions were ob- 
tained on three scales (Figure 3). In regard 
to mode of presentation, the patterns in the 
graphed data indicate that Anglos were 
rated highest in the audio mode and Blacks 
were rated most favorably in the visual 
mode; mode of presentation made little dif- 
ference in the ratings for Chicanos. Mode of 
presentation seemed to affect ratings for 
Black students more than ratings for Anglo 
or Chicano students, with Blacks in the 
audio mode being rated significantly lower 


Ethnicity X Class 
Scale 4: Attendance 
Scale 8: Works Independently 
Scale 11: Responsibility 
Scale 13: Creativity 


than Anglo students in all three conditions 
for all three scales. On Scales 4 (Attend- 
ance) and 13 (Creativity), Black students 
in the visual condition were rated sig- 
nificantly higher than Black students in the 
audio condition. Chicanos, exeept as al- 
ready mentioned, were not rated signifi- 
cantly differently than Blacks. 

A Mode x Class interaction was obtained 
on Scale 8: Works Independently. Figure 4 
shows that, except in the visual mode, mid- 
dle-class students were evaluated more fa- 
vorably than lower class students. However, 
only the difference in the audio-visual con- 
dition was statistically significant. 

Ethnicity x Class interactions were ob- 
tained for five seales. Figure 5 shows that 
for Anglo and Black students, teacher 
evaluations decreased from middle- to 
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Scale 3: Effort - 
Ficure 2. Main effect for class. (Abbreviations: 
M = middle and L = lower.) 
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FiavRE 3. Mode X Ethnicity interactions. (Abbreviations: Au — audio, V — visual, AuV 
= audio-visual, A = Anglo, B = Black, and C = Chicano.) 


lower-social-class students, Evaluations for 
Chicanos did not seem to reflect this trend. 
Chicano middle- and lower-class students 
and Black lower-class students were rated 
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Scale 8: Works Independently 
, Ficure 4. Mode X Class interaction. (Abbre- 
viations: Au = audio, V = visual, AuV = audio- 
visual, M — middle, and L — lower.) 


Similarly at the lowest end of the scale 
(ie, did not differ significantly from each 
other), while Anglo middle-class students 
always received the most favorable ratings. 


Triple Interactions 


Triple interactions were obtained on six 
Scales. The data for all scales except Scale 
(Neatness) shared patterns. Therefore, 
What follows is a discussion of the patterns 
Scales 1 (Participation), 2 (Attitude), 
(Test Performance), 6 (Motivation), and 


14 (Self-Concept). Scale 15 (Neatness) is 
discussed separately at the end of this 
section. 

In general, the triple interactions show 
patterns which are consistent with those 
obtained with the double interactions (Fig- 
ure 6). Anglo middle-class students in the 
audio and audio-visual conditions were 
rated more favorably than students in 
nearly every other set of conditions, Anglo 
lower-class students were rated generally 
more favorably than both middle- and 
lower-class Chicanos and lower-class Blacks 
in all modes (except in the audio-visual 
mode on Scale 9). Black middle-class stu- 
dents fared better in the visual modes (vis- 
ual and audio-visual) than in the audio 
mode, while mode of presentation seemed to 
make little difference for Black lower-class 
students. Regardless of mode of presenta- 
tion and class, Chicanos were represented in 
the bottom half of the ratings on every 
scale. With only one exception (visual/ 
Chicano middle-class on Scale 9), none of 
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Scale 9: Intelligence Scale 14: Sell Concept 


Ficure 6. Mode X Ethnicity X Class interactions. (Abbreviations: Àu — audio, V — 
visual, AuV = audio-visual, A = Anglo, B = Black, C = Chicano, M = middle, and L = 


lower.) 


the differences for Chicanos were statis- 
tically different. Except in the audio mode, 
Black middle-class students were evaluated 
more favorably than Black lower-class stu- 
dents. 

For Scales 5 (Test Performance), 6 
(Motivation), and 9 (Intelligence), there 
was a narrower range of scores in the visual 
condition. For Scales 1 (Participation), 2 
(Attitude), and 14 (Self-Concept), there 
was an approximately equal range of rat- 
ings from one mode of presentation to 
another. 

Scale 15 (Neatness) did not share pat- 
terns with other scales (Figure 7). Mode of 


presentation had the most obvious effect on 
this scale since students in the visual condi- 
tion were rated approximately the same (al 
the favorable end of the continuum, whic 
could be expected since students were A 
neatly dressed). Audio cues and audio-VI5" 
ual cues elicited a wider range of rating 
than visual cues alone. 
Social Desirability 

Teachers’ responses to the Crowne Mit 
low Social Desirability Scale showed tha! 
as a group, teachers in this study ha aH 

p 


of a tendency to respond in à socially 9 
sirable manner (X = 14.25, SD = 5 


“cot 
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than those in the normative group (X = 
1599, SD = 5.54; t = 3.63, df = 1573, p < 
001). The notion that teachers’ responses 
were significantly related to the tendency to 
respond in a socially desirable manner was 
not verified. 


Discussion 


Results obtained on the 15 semantic dif- 
ferentials suggest that teachers in the test 
city have developed a pattern of responses 
to students which are affected by mode of 
presentation, ethnicity, and social class. The 
practical result of these stereotypie re- 
Sponses is differential treatment of stu- 
dents. In other words, some students are 
"more equal” than others. 

In general, Anglos were rated higher than 
either Blacks or Chicanos. Three possible 
explanations may account for this finding. 
First, Cooper (1972) has shown that one’s 
own ethnic group is evaluated more favor- 
ably than other ethnic groups. Since roughly 
84% of the subjects were Anglo, the 
teachers may have been responding to stu- 
dents on the basis of ethnocentrism. 

A second possible explanation is that the 
ratings were a function of social stereotypes. 
The fact that ratings for Blacks and Chi- 
canos were differentiated from each other as 
well as from Anglos lends support to this 
explanation, 

The third explanation is the social domin- 
ance theory, This theory suggests that the 
Proportional size of a minority group to the 
Majority group will affect beliefs about the 
minority group. As the size of the minority 
group increases, its threat to the social and 
economic dominance of the majority group 
increases. In our test city, Chicanos com- 
Prise a considerably larger proportion of the 
Population than do Blacks; according to 
this theory, we would expect Chicanos to be 
tated lower on the scales than Blacks, and 

ey were. Regardless of the reason(s) 
Operating to produce these results, Anglo 
Students were regarded more favorably 
than either Black or Chicano students on 

3 of the 14 scales which were influenced by 
ethnicity. 

Students’ social class, once again proved 
? be a significant variable in determining 
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Scale 15: Neatness 
Ficure 7. Mode X Ethnicity X Class inter- 
action. (Abbreviations: Au = audio, V = visual, 
AuV = audio-visual, A = Anglo, B = Black, C = 
Chicano, M = middle, and L = lower.) 


how students were rated. Middle-class Anglo 
and Black students fared better than lower- 
class Anglo and Black students. These find- 
ings are consistent with stereotypes of the 
poor as lazy and not caring about getting 
ahead (Becker, 1952; Davis, 1972) and with 
results of earlier studies dealing with the 
evaluation of middle- and lower-class 
speakers from vocal cues (Harms, 1961; 
Moe, 1972; Naremore, 1971; Williams, 
Whitehead, & Traupman, 1971). 

However, class made little difference in 
the ratings for Chicanos, who were con- 
sistently rated at the lowest end of the 
scales. Why ethnicity overrode class is un- 
known, though the large proportion of 
Chicanos in the area allows one to speculate 
that the social dominance theory may have 
been operating to produce lower ratings for 
Chicano middle-class students. 

Although no main effects for mode of 
presentation were obtained, mode of pres- 
entation did interact with both ethnicity 
and class to alter the ratings of students. In 
other words, how students were perceived 
affected how they were rated. While audio 
cues helped establish both ethnicity and 
class, visual cues seemed to be more useful 
in transmitting ethnic cues than class cues, 
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The visual-only condition seemed to pro- 
vide fewer cues which could be used to 
differentiate students than either the audio 
or audio-visual conditions. This finding is 
consistent with Buckingham’s (1972) con- 
clusion that the audio channel contains 
more information than the visual channel. 
In practical terms, this study provides 
evidence that educational equality for stu- 
dents from different ethnic and social-class 
backgrounds is à myth, at least in our test 
city. Teachers’ expectations for students’ 
classroom behavior are affected by their 
stereotypes of different ethnic and social- 
class groups. The communication of these 
expectations to students through both verbal 
and nonverbal means is likely to act as a 
subtle yet powerful shaping mechanism for 
student behavior. This shaping of student 
behavior to teacher expectations may ac- 
count for a large proportion of the differ- 
ences in educational attainment for different 
ethnic and social-class groups. An additional 
finding of this study suggests that Black and 
Chicano students from lower social classes 
may be more favorably evaluated if they 
are seen and not heard. This suggestion 
comes from the finding that teachers seem 
to discriminate between students less in the 
visual mode of presentation than in either 
the audio or audio-visual modes. 


CONCLUSIONS 


While the following conclusions seem 
justified in light of the present data, caution 
should be exereised in generalizing these 
conclusions to other populations. 

1. Anglo students are evaluated more 
favorably than Black or Chicano students. 
CIR students are evaluated least favor- 
ably. 

2. For Anglo and Black students, class 
is a salient dimension in teacher evalua- 
tions. However, for Chicano students, class 
has little influence on teachers’ ratings, and 
ethnicity seems to be most relevant to 
teachers’ evaluations. 

3. The audio channel contains more in- 
formation for making classroom judgments 
than does the visual channel. 

4. For Anglo middle-class students, 
evaluations are more positive when these 
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students are vocal; for Anglo lower-class 
students, evaluations are more positive in 
nonvocal situations. Black students are 
rated more favorably in visual and audio= 
visual conditions than in the audio-only 
situation. Regardless of vocal or visual cues; 
Chicanos are evaluated at the low end of the 
scale in comparison to Anglo and Blacki 
students. 

5. Judgments of students are based rarely 
upon single dimensions. Rather, stereotyp- 
ing seems to be influenced by a number of 
considerations, including ethnicity and 
class. How cues for these two dimensions 
are received by the teacher (via audio, 
visual, or audio-visual channels) affects the 
importance attached to them. 
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with direct 


instruction ‘via curriculum individualization on reading achievement. 
Teachers in two classrooms were trained in the contingent reinforce- 
ment of group survival skills (GSS); one teacher was given specific 


instructions. Children in all 


elassroo! : 
before and after intervention. Both GSS and DI approaches resulted in 


improved reading achievement when compared ; 
i i ill training inereased the proportion of 


children's survival-skill behaviors. Tt was concluded that achievement 
may be increased directly through curriculum improvements or in- 


directly through 
academic achievement. 


inereasing behaviors which are prerequisites for 


Continuing investigation into the identifi- 
cation of precise, observable, and manip- 
ulable parameters controlling academic 
achievement is a vital prerequisite for estab- 
lishing successful classroom intervention 
procedures. One set of such parameters, 
academie survival skills, (Cobb, 1970, 
1972a) has been found to be correlated with 
elementary school achievement (Cobb, 1969, 
1970, 1972b; Lahaderne, 1968; Myers, 
Atwell, & Orpet, 1968). These "survival 
skills" are specific classroom behaviors that 
are not academic responses per se but are, 
instead, the necessary basis for such aca- 
demie responding. Examples include attend- 
ing to teacher and volunteering to par- 
ticipate in classroom discussions. More 
recently, empirical evidence has established 
a clear functional relationship between sur- 
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vival-skill behaviors and reading achieve- 
ment in the first grade. Increased propor- 
tions of survival-skill behaviors resulted in 
significantly increased scores on standard- 
ized achievement tests for children who 
were initially low on both survival-skill be- 
haviors and achievement (Cobb & Hops, 
1973) and for entire regular classrooms 
(Hops & Cobb, 1973). 

Other approaches have also produced 
increased academic achievement. Children 
who received direct instruction in reading, 
particularly with emphasis on the sys- 
tematic programming of phonics or decod- 
ing skills made significant gains in reading 
(Bliesmer & Yarborough, 1965; Karnes 
Hodgins, & Teska, 1968; Waugh, 1969). In 
an effort to understand the place of sul 
vival skills in academic achievement, i 
present study was designed to determine } 
the cause-and-effect relationship be ds 
achievement and academic survival SKi k 
was unidirectional. That is, do increases x 
achievement effected through a direct 8| a 
on the curriculum and academic respond? 
result in concomitant changes in survi 
skill levels. s 
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ACADEMIC SURVIVAL-SKILL TRAINING 


In evaluating the relationship between 
academic survival skills, direct instruction, 
and elementary school achievement, the fol- 
lowing hypotheses were tested: (a) children 
who receive direct programmatic instruction 
in reading increase their achievement but 
not their survival-skill levels; and (b) chil- 
dren who receive academic survival-skill 
training increase their skill levels both in 
achievement and in survival-skill behaviors. 

In the present study, the effect of each 
of the two approaches (survival-skill train- 
ing and direct programmatic instruction) 
on first-grade reading achievement was 
compared with a no-treatment control 
group. 


METHOD 


Subjects 


The study was conducted in four first-grade 
classrooms in a school district of 21,000 pupils. 
The classrooms were randomly assigned to the 
experimental conditions. Eighty-one children, for 
whom complete pre- and postintervention read- 
ing achievement scores were obtained, comprised 
the entire sample. There were 20 pupils in the 
control (C) classroom, and 19 in the direct in- 
struction (DI) classroom, The two remaining class- 
rooms, both of which received group survival- 
skill training (GSS), had 21 students each, 


Achievement Tests 


The Gates MacGinitie Primary A, yielding 
Comprehension and Vocabulary scores, and the 
Gates-MacGinitie Reading Readiness tests were 
administered both prior to and six weeks after 
the termination of the formal intervention pro- 
cedures, The mean of the standard scores of the 
three tests for each student was used as the de- 
pendent variable. The number of school days be- 
tween testings ranged from 72 to 77 days across 
the four classrooms. 


Observations 


During the same weeks in which achievement 

ting occurred, observations were made of the 
Student's classroom behavior during all reading 
Periods for five consecutive days. An interactive 
Coding system (Cobb & Hops, 1971) was used 
to record the behavior of all the children in a 
Prearranged manner. The observer recorded the 

haviors of each student for a minimum of two 
“onsecutive eight-second intervals before going on 
to the next student on the list. After all the chil- 

n had been coded once, the process was re- 
Peated so that the behavior of each child was 
Sampled about the same number of times during 
each session, 
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Four full-time and two part-time observers 
gathered data. Each new observer was required to 
reach an 85% or higher level of agreement on 
three occasions with an experienced observer before 
her independent observations were considered re- 
liable. Following training, the same level of agree- 
ment was maintained by periodic reliability checks, 

Observer reliability was calculated by dividing 
the total number of agreements by the total num- 
ber of agreements and disagreements. Throughout 
the study, reliability data were collected system- 
atically between observers. Based upon 27 paired 
observations, reliability ranged from 85% to 99% 
with a mean of 94%. 

Cobb (1970) reported significant correlations 
between first-grade reading achievement and three 
academic survival skills: attending (.45), volun- 
teering (.59), and look around (—41). For the 
current study, attending was divided into two, 
newly defined, distinct categories: attending and 
work. To compute a student’s survival-skill level 
for the pre- and postintervention observation 
periods, the frequency of look around was sub- 
tracted from the summed frequencies of attend- 
ing, volunteering, and work. The obtained figure 
was divided by the total frequency of all behaviors 
and represented the proportion of academic sur- 
vival-skill behaviors in each child's repertoire 
during each measurement phase. 


Intervention 


All of the experimental manipulations oc- 
turred in the regular classroom setting. The pri- 
mary focus was alteration of the teacher's be- 
haviors, which were designed to enhance during 
intervention and maintain during follow-up, 
favorable academic and/or survival-skill behaviors 
in the children. 

The group survival-skill program (GSS), based 
upon the work of Packard (1970), Patterson, 
Cobb, and Ray (1972), Walker, Mattson, and 
Buckley (1971), and Walker, Fiegenbaum, and 
Hops (1971), has been described in detail else- 
where (Cobb, 1972a; Cobb & Hops, 1973). Teacher 
training included the use of modeling, assigned 
reading, cueing, and daily feedback, all of which 
were faded out as the teacher acquired the skills 
and the children’s behavior improved. Components 
of the child-training procedures taught to and 
employed by the teachers were the pairing of 
group-nonsocial reinforcement with individual- 
and group-social reinforcement, vicarious rein- 
forcement, shaping procedures, close monitoring, 
and the withdrawal of nonsocial reinforcement by 
gradually increasing the criterion for such rein- 
forcement. The intervention period lasted 20 
school days, and the total consultant time for 
each of the two GSS classrooms was approximately 
12 hours. f y à 

The approach used in the direct instruction of 
reading (DI) was based upon the assumption that 
reading can be taught in a programmatic fashion 
by individualizing the curriculum. The task (read- 
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ing) is analyzed into a hierarchy of subskills; the 
teacher identifies the entering behaviors or particu- 
lar subskill levels of each child; the child is 
taught the next task in the hierarchy; and finally, 
the teacher determines by a preset eriterion 
whether the child has mastered one subskill before 
proceeding to the next. 

Individualizing Reading Instruction (Schmahl, 
Bowman, & Hops, 1973)* was a manual specifically 
prepared to help the teacher use the procedures. 
Tt contained a flowchart which listed in sequential 
form all the components deemed necessary for 
the efficient teaching of reading. While it was de- 
signed to complement basal curriculums with 
phonics and vice versa, the greatest emphasis 
was given to the phonies components. Previous 
research has shown that beginning reading pro- 
grams which stress phonics or sound-symbol re- 
lationships, even when curriculums are individu- 
alized and programmed, are superior to those that 
do not (Bliesmer & Yarborough, 1965; Bond & 
Dykstra, 1967; Chall, 1967). 

A short written test was administered to the 
teacher to ensure that she had learned the con- 
cepts presented in the manual. Then, using fre- 
quent observations, feedback, social praise, and 
modeling by the consultant, the teacher was first 
trained to work with three low-reading achievers 
and was subsequently allowed and encouraged to 
generalize her training to three similar children 
and, finally, to the entire classroom. The training 
period lasted 20 school days, and the consultant 
spent 17 hours working with the teacher. 

In lieu of direct service, the control teacher 
was provided with a graduate student in special 
education who acted as a teacher aide during the 
course of the study. The student was given no 
specific instructions and spent considerably more 
time in the control classroom than did any of the 
consultants in the experimental classrooms. 


RESULTS 


Preexisting differences between groups on 
the dependent variable have been discussed 
as a critical factor in evaluating change 
scores by Campbell and Stanley (1963). 
These authors suggest the analysis of co- 
variance as a statistical method of control- 
ling for such differences. Others (O’Conner, 
1973) have criticized the use of the analy- 
sis-of-covariance model. This criticism is 
particularly valid when assumptions under- 
lying the model, such as homogeneity of 
regression coefficients, are not met (Kirk, 
1968). In the present experiment, tests for 
homogeneity of regression were conducted 


* Copies of this manual are available upon re- 
quest. 
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on both dependent variables. The survival- 
skill data produced an F value of 1449 
(df = 3/72, p < .001) indicating that the 
homogeneity assumption was untenable; the 
achievement data met the requirement with 
anF <1. 

In addition, the initial scores for survival 
skills and mean achievement (see Table 1) 
were subjected to one-way analyses of vari- 
ance to determine if pretreatment differ- 
ences existed. The F values indicated that 
there were no significant differences between 
the groups on the observation (F = 1.00, 
df = 3/77, p > .05) or achievement (F = 
2.39, dj = 3/77, p > .05) data. 

In the present study, no differences be- 
tween groups on either variable existed prior 
to treatment, and the data for one variable 
could not meet the assumptions underlying 
the analysis of covariance. Consequently, 
gain scores for both survival skills and 
mean achievement were used as the depend- 
ent variables and subjected to statistical 
analyses. Mean scores and standard devia- 
tions of both variables are presented in 
Table 1. 

To determine whether the classrooms 
which received the group survival-skill 
training made greater gains in survival-skill 
behaviors than those that did not receive the 
training, a one-way analysis of variance was 
carried out on the gain scores. Significant 
differences between groups were found (F= 
5.00, df = 3/77, p < .005). Post hoc com- 
parisons using the Newman-Keuls test in- 
dicated that the mean gains for both of the 
survival-skill-training groups (GSS: = -14 
GSS» = .10) were significantly greater than 
the gains made by the direct-instruction 
classroom (DI = .04) or the control class- 
room (C = .02). No differences were foun 
between the gain scores of the two Gs 
classrooms. 

A similar analysis was conducted on the 
achievement gains, that is, on the mean 9 
the three standard scores (Vocabulary 
Comprehension, and Readiness). Here, t00 
a significant F value was obtained q 
9.15, df = 3/77, p < 001). Post hoc ant 
yses using the Newman-Keuls procedi 
showed that all three experimental grouP® 
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TABLE 1 
MEAN PRETREATMENT AND MEAN GAIN SCORES FOR SURVIVAL-SKILL BEHAVIORS 
AND ACHIEVEMENT STANDARD SCORES 
Survival skills M achievement 

Group LÀ Pretreatment Gain Pretreatment Gain 

2 SD x SD x SD x SD 
C = 20 .63 .09 .02 14 52.5 7.5 7.6 4.7 
GSS, 21 .59 .05 .14 .08 48.2 6.5 12.9 4.8 
GSS, 21 55 14 -10 12 48.6 5.7 10.9 5.0 
DI 19 59 14 04 B 51.7 5.1 15.0 4.0 


Note. Survival-skill scores — attending + work + volunteering — look around; achievement stand- 


ard scores — 
* Abbreviations: C = control, GSS; 
Skill Training 2, and DI = direct instruction. 


made significantly greater gains than the 
control classroom (GSS; = 129, GSS. = 
10.9, DI —-15:0//€ ="7.6): However, the 
gains made by the eurriculum-training 
classroom were significantly greater than 
those by GSSe, but not GSS;. 

The first hypothesis, that children who 
received direct programmatic instruction in 
reading would increase their achievement 
level but not their survival-skill behavior, 
was supported. The curriculum-training 
classroom, while showing no gains in sur- 
vival skills, made significantly greater gains 
in achievement than both the control group 
and one of the survival-skill-training 
groups. The second hypothesis was also sup- 
ported: children in the group survival-skill- 
training program demonstrated greater 
gains in achievement and survival skills 
than did the children of the control class- 
room. Thus, the data show that survival- 
skill behaviors are increased by survival- 
skill training, and not by training in reading 
Skills. Scores in reading achievement, in 
Contrast, were increased both by direct in- 
Struetion in reading and also by survival- 
skill training, 


Discussion 


The results indicated that increases in 
reading achievement can be effected directly 
through well-programmed individualized 
Teading instruction and indirectly through 
Programs which are designed to increase 


(Readiness + Vocabulary + Comprehension) + 3. 
= Group Survival-Skill Training 1, GSS, = 


Group Survival- 


children’s rates of specific classroom be- 
haviors by enabling them to profit more 
from the existing curriculum. Given an ex- 
perimentally unmanipulated basal curricu- 
lum, as in the survival-skill classrooms, 
gains in achievement may be established 
through behavioral improvements in the 
children. Or, given an experimentally un- 
manipulated level of survival skills, as in the 
direct-instruction classroom, achievement 
increases may be effected through improve- 
ment in the curriculum. While not tested 
in the present study, the most effective 
program would, presumably, combine both 
components; children would be taught to 
attend more and work more (survival 
skills) to the teacher's presentation of an 
improved curriculum (direct instruction). 

A notable limitation of the present study 
is the small number of classrooms involved 
in each of the conditions. Replications of 
these findings will be required before any 
firm conclusions can be reached about the 
relationship between academic survival 
skills, direct instruetion, and academic 
achievement as measured by standardized 
achievement tests. 

The findings of the present study, while 
preliminary, have further implications. 
Both sets of procedures can be of practical 
importance to the classroom teacher. 
Teachers with effective curricula may still 
require help in classroom management for 
increasing survival-skill behaviors of chil- 
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dren. On the other hand, children in some 
classrooms may be working near maximum 
levels while using a curriculum which 
severely limits the progress individual chil- 
dren can make. The materials may be too 
difficult and frustrating for children at the 
lower end of the achievement distribution 
or too easy and boring for children at the 
upper levels. 

A second issue bears upon the response 
cost to the teacher of implementing these 
procedures. In the authors' experience, pro- 
grams which require less effort are more 
likely to be implemented effectively. How- 
ever, inereasing the efficiency of the class- 
room by using either program can also in- 
crease the workload of the teacher as she 
responds more to individual children's 
needs. Problems of this nature were beyond 
the scope of the present study but neverthe- 
less are critical issues. 

Another question remains to be answered 
by additional research. Will curriculum im- 
provements, such as the direct-instruction 
program used in the present study, have 
positive benefits only for children above a 
critical level of survival skills? Presumably, 
children who do not respond to the teacher’s 
controlling stimuli in a classroom would not 
respond maximally to the best available 
curriculum materials. Perhaps, more power- 
ful motivational programs (Hops, Walker, 
& Hutton, 1973) would be necessary prior 
to or in conjunction with improved cur- 
ricula. Further studies are required to deter- 
mine precisely what the interaction is 
between a child’s survival-skill level, cur- 
ricula, and increases in academic achieve- 
ment. 
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SIGNAL SYSTEMS OF LESSON SETTINGS AND THE TASK- 
RELATED BEHAVIOR OF PRESCHOOL CHILDREN' 


JACOB 8. KOUNIN? 
Wayne State University 


PAUL V. GUMP 


University of Kansas 


Thirty-six different teachers were videotaped teaching 596. lessons 
in a special room. Lessons were seen as signal systems to participants, 
These signal systems were characterized as varying along the dimen- 
sions of continuity, insulation, and intrusiveness, Using task involve- 
ment as the criterion, the most successful lessons are those in which 
there is a continuing and protected signal system (as in individual 
construction). Lessons of average success are those with a continuous 
input from a constant source (books, records, teacher demonstrations). 
The least successful lessons are those dependent upon discontinuous 
inputs from other children (role play, group construction) and those 
having high intrusiveness (gross motor activity, loud musical instru- 


ments). 


A major job of teachers is to create and 
maintain beneficial activity settings. Yet 
the amount of information developed by the 
research community concerning activity 
settings in schools or in other environments 
is minute (Moos, 1973). Studies regarding 
how children develop or what specific inter- 
ventions can do to their behavior and learn- 
ing abound, but there is a paucity of re- 
search about the impact of activity settings 
upon child behavior. 

Educators specializing in early childhood 

education appear to have some conventional 
wisdom about the kinds of activity settings 
to provide in a preschool or at least some 
consensus about the kinds of activities to 
provide. Most preschools, for example, pro- 
vide large blocks and coloring equipment, 


* This investigation is supported by PHS Re- 
search Grant MH-15472 from the National In- 
stitute of Mental Health. Lawrence Sherman di- 
rected the computer programming and provided 
considerable assistance in a variety of ways. The 
authors wish to express their gratitude to Norma 
Law, Sharon Elliott, and Hilda Weems of the 
Wayne State Nursery School for their intensive 
help and to the 36 student teachers who partici- 
pated in this research. 

* Requests for reprints should be sent to Jacob 
S. Kounin, Department of Educational and Clin- 
ical Psychology, Wayne State University, Detroit, 
Michigan 48202. 


and none provide calculus texts or standard- 
size basketball courts. Shure (1963) has 
pioneered researeh showing differences in 
such variables as popularity, population) 
density, sociality, and  constructiveness. 
among different open settings of a preschool: 
The authors, however, have been unable to 
locate any research dealing with properties. 
of formal lesson settings in preschools 88 
these relate to the behavior of children 
participating in the lessons. ) 

"This report is part of, and was generated 
by, a research project investigating the pre- 
school as a complex of activity settings. A” 
school may contain both open settings: 
(block corner, sandbox, art section with 
easels) as well as prescriptive lesson sete 
tings. Each activity setting has two aspects: 
a particular milieu with its appropriate 
behavior objects and a format or program 
With a standing behavior pattern which fits 
that milieu. Barker (1968) has provided 
the generic name of synomorphs for sve 
ecological units. 

The focus of this paper is upon the rela 
tionships between properties of form? 
lessons and the behavior of child partic 
pants. Such formal lessons are parts * 
school environments, and a lesson ers 
regarded as an activity setting inhabited i 
teacher and pupils who are in the les , 
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much as they might be in a 
game or in a school play. 

The research asks whether certain quali- 
ties or dimensions of lessons can be deline- 
ated, whether these qualities can predict to 
the task-related behavior of children in 
these lessons, and whether these predictions 
can be made independently of the differ- 
ences of the teachers and children who in- 
habit these lessons. 


playground 


METHOD 


Population, Setting, and Regime 


The study took place in a university nursery 
school i i 


group was heterogeneous with respect to age, sex, 
race, and socioeconomic background. Two ex- 
perienced, certified teachers were present at all 
times. In addition, 
present, three in the 
afternoon. 

The children, one half boys and one half girls, 
ranged in age from 29 to 65 months at the time 
they were first videotaped. The racial mixture 
consisted of 60% non-Caucasian (including an 
Indian, an Oriental, and a Mexican) and 40% 
Caucasian. The incomes of the children’s families 
ranged from below $3,000 a year to $75,000 a year. 

As part of the daily routine of the preschool, a 
preformed subgroup of children from each session 
was taken to a special room and was presented 
with a formal lesson by a student teacher, The 
lesson groups were formed by the head teacher 
on the basis of approximate chronological age but 
were heterogeneous with respect to sex and race, 
The number of children in a lesson group oc- 
casionally varied due to absences and ranged from 
three to nine; however, 87% of the 596 lessons 
contained from four to eight children. 

The lessons occurred in a limited and protected 
Segment of the total nursery school regime. Their 
milieu was a 2.13 x 2,74 meter room which had 
One entrance door, one window, and a one-way 
observation window. A high shelf on the wall, 
Accessible only to the teacher, and a light bulb 
hanging from the middle of the ceiling were the 
Only permanent, furnishings in the room. (The 
light was switched off by the teacher when the 
Short rest. period prior to the lesson was initiated 
and was switched on when the teacher stopped 
the rest period and started the lesson proper.) The 
Toom was empty except for those supplies that were 
brought in by the teacher for a Specific lesson. A 
8roup was scheduled for a lesson period for ap- 
Proximately 20 minutes a day, including the brief 
Test period, 

The lessons were chosen and planned by the 
Student, teachers, "There was no particular unifying 
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theme from lesson to lesson except, perhaps, 
judged appropriateness for the preschool-aged 
child. There was no Standard to stress language, 
concepts, numbers, or any kind of "subject." 
Rather, the lessons were diverse in content and 
format both between teachers and for any one 
teacher on different days. 

The lessons were recorded on videotape in order 
to have enduring and complete information that 
could be viewed as many times as required. A 
stationary video camera was attached to a corner 
about two meters from the floor, and a micro- 
phone was hung on the side having the one-way 
window. A wide-angle lens made it possible to 
view most of the floor space. The operator and 
the recording equipment were in a different room, 
behind the one-way window. 

The operator videotaped 596 lessons taught by 
36 different teachers (6 teachers per academic 
quarter; 3 for the morning and 3 for the after- 
noon groups). All teachers of any academic quarter 
taught all three groups an equal number of times, 
and all the lessons of a teacher's first two groups 
were recorded. The recording extended over a 
period of two years, and 87 different children were 
the unrehearsed participants. 

The videotaping of the lessons was an estab- 
lished routine for the two years the study was in 
progress. The investigator had no influence on the 
lessons taught nor did he provide any feedback 
to the teachers about their procedures or lessons. 

At this point we would like to interject some 
comments about the general procedure. We started 
with a curiosity but without any theory except the 
general notion that settings and lesson formats in- 
fluence behavior, The approach was ecological in 
that there were no external constraints either on 
input data or on output data, Nevertheless, the 
situation at the nursery school provided an inves- 
tigator with an opportunity to study uncontrived 
phenomena that met most of the requirements of 
experimentally created events. Variations in inde- 
pendent variables occurred with sufficient fro- 
quency and magnitude; background variables such 
as the population and general school situation were 
constant; the populations participating in the les- 
sons were balanced in that the different teachers 
met the same groups the same number of times 
(for any one year, the nine teachers of the morn- 
ing session met the same lesson groups an equal 
number of times as did the nine teachers of the 
afternoon session). While there was a lack of con- 
trol over “balancing” the lessons for lesson types, 
the ecological method has the advantage of miti- 
gating against any Hawthorne effect or the even 


more pervasive problems involved in experimenter 


effects in general (Orne, 1962; Rosenthal, 1966). 
This methodology also avoids the different results 
obtained from "subjects" in experiments as com- 
pared to pupils in classrooms, as documented by 
Kounin (1970). (One might also note that the 
coders of child behavior did not know the theory 
that was being tested and, at the time the coding 
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was taking place, that the researchers didn't ei- 
ther.) 


Variables and Their Measurement 


With possession of 596 lesson tapes, one inves- 
tigative problem became the quantification of these 
phenomena. How could these repeatedly viewable 
but free-flowing events be translated into quanti- 
fied variables so that relational questions could 
be answered? 


Dependent Variable 


Specification of the dependent variable was 
straightforward; since lessons were meant to en- 
gage or involve children, it became important to 
determine which varieties of lessons best held the 
attention of the children and supported the appro- 
priate behaviors. A child’s behavior was categor- 
ized in one of three ways: 

Appropriately involved. This was coded when 
the child was clearly with the official activity and 
in a manner appropriate to the thrust of the lesson 
at that time. The involvement could be passive, 
for example, listening to a story, watching a dem- 
onstration; or it could be active, for example, 
cutting pictures, pasting objects on cardboard, re- 
citing, singing, dancing. 

Not involved. This was coded when the child 
showed no overt signs of being with the activity 
yet was not misbehaving or behaving inappro- 
priately. 

Inappropriately involved or deviant. Inappro- 
priateness was coded when the child was involved 
in the activity but in a clearly unsuitable fashion, 
for example, pounding a magnet against a radiator 
instead of seeing what sticks to it, “messing” with 
paste instead of pasting green and red squares on 
a cardboard, racing about when supposedly skip- 
ping to put a valentine in a mailbox. (Making 
“mistakes” or performing poorly were not coded 
as inappropriate.) A deviant act was an intentional 
wrongdoing on the part of the child: interference 
with an ongoing lesson (grabbing the book during 
teacher’s reading a story); aggression against chil- 
dren or property (hitting, pulling hair, throwing 
equipment against the wall) ; interfering with legit- 
imate work of other children (taking props away 
while another was working); open difiance of the 
teacher. 

Each child was coded every six seconds from the 
time the lesson was started (excluding preparation 
time) until it was over. Intercoder reliabilities for 
three coders ranged from 93% to 96% agreement. 

The dependent variable, then, was the percent- 
age of off-task behavior in each lesson. This score 
was computed by dividing the number of six-sec- 
ond units with no involvement, inappropriate be- 
havior, or deviancy by the total number of six- 
second units coded for that lesson. 


Independent Variables 


The categorization of the lessons was the major 
challenge of this research. The ecological niches 
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that lessons represent offer numerous dimensioi 
which might be meaningfully related to pupil im 
volvement. The strategy in this research, first, wal 
to identify lesson types rather than single vi 
bles. Inferences about operative variables with 
lesson types were made, but the initial effort was 
to discriminate lessons as wholes rather than tg 
measure specific unidimensional variables withi 
lessons. 

As one observes children in a lesson, it becomes 
clear that their actions are prodded, oriented, and 
supported by the external provisions of the lesson; 
These provisions include the communications of 
the teacher ("Let's see what sticks and what doesn’ 
stick to a magnet.”) and the props that go with 
the lesson (magnet, paper clips, pieces of paper 
and cloth, nails). A lesson also includes the stand= 
ing behavior pattern that goes with the lesson; 
(making piles of objects that stick or don't stick: 
to the magnet, listening to a story being read). 

Those external provisions which signal these 
standard actions can be labelled signal systems, 
One of the principal ways in which lessons differ 
is in the pattern of their signal systems. Furthere 
more, lessons’ signal systems are like the partici 
pation rules of a game: they are repetitive. ; 
can assume that when lesson type X appears a sec- 
ond time, signal system X* will also reappear; that 
is, the external provisions supportive of standai 
participant action which appeared in the first ing 
stance will reoccur in the second. : 

We may hypothesize that the more continuous 
and unlagging the provisions of a lesson, the 
greater the task involvement of a group of chil- - 
dren, We may assume that all lessons are plann 
to produce such continuity and related task in- 
volvement. Lessons simply aren't programmed to 
have nothing to do with nothing. The problem 
then becomes one of seeing whether certain types 
of lesson formats are more likely to produce 8. 
continuous flow of appropriate signals than Bre 
others. A reciprocal problem is to ascertain whether | 
certain formats increase the likelihood of lags 0T - 
lacunae that may reduce task involvement ani 
make the lesson more vulnerable to inappropriate 


| 
4 

or deviant behavior. poke 
Let us illustrate the concept of continuity. Com- 


mon lessons in our data showed teachers reading 
books or playing records to encircling chil- 
dren. In terms of the signal system concept, this 
format calls for a single, continuous source of sig- 
nal emission. This format can be manned by ane 
central occupant as a signal source. Providing 
central occupant is adequate, this format produces 
a continuing signal flow to child participants. ^ 
format should yield high involvement or low © 
task behaviors. igoal 
A single, continuous emitter is also the S!P i 
system when a teacher conducts a demonstratioi 
for example, shows how popcorn is made. a 
latter format also contains persisting exter 
props which invite continuous watching. i 
low off-task behavior should result. 3 
In contrast to the single, continuous source | 
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a signal system that relies upon multiple, shifting 
signal sources manned by other children. Such mul- 
tiple, shifting signal sources are present in formats 
calling for group discussions (general talk about 
community helpers), group projects (making a 
group mural on a felt board), or unrehearsed role 
play (playing passengers and driver in bus play). 
In these formats, a common element is that chil- 
dren provide major signals for one another. When 
these signals are inadequate (as is likely for the 
performances of unrehearsed children), such a for- 
mat is vulnerable to lags and faltering continuity. 
Relatively high off-task behavior should result. 


In line with this theory, the more familiar reci- 
tation format is similar to the above formats in 
that it also calls for multiple, shifting signal sources. 


Each shift in the signal source means that a fail- 
ure of signal is possible, and when sources are not 
adequate, the shifts may result in breaks in the 
signal system. Pupil off-task behavior in these les- 
sons could be predicted to be relatively high. 

The success of a formal lesson with a group of 
children is related to two issues: one has to do with 
the delivery of signals which support appropriate 
behavior and the other has to do with the preven- 
tion of inputs that may encourage inappropriate 
behavior. We now turn to the discussion of two 
lesson formats which relate to the latter issue. 

Let us consider the case of an individual con- 
struction lesson. The teacher provides each child 
with rs, paste, a sheet of paper, and magazine 
pages showing pictures of food and suggests that 
each child make a collage of desserts, After a child 
begins such an activity, the major and persisting 
external signals come from the changing conditions 
of his materials. He selects a picture, but it must 
be cut from the page; once cut, the picture re- 
quires paste; when paste is applied, it needs to be 
Pressed onto the paper; the remaining space on 
the paper and the pages of pictures signal selecting 
another desert, and so on. A continuous signal sys- 
tem occurs as one action and its immediate result 
provide impetus and guidance for the next action. 
This signal system and all individual-construction 
lessons thus provide continuous signals; they 
should induce high involvement provided each 
child has appropriate materials and is capable of 
grasping the goal and of carrying out the necessary 
Participatory actions. It occurred to us, however, 
that these individual-construction lessons contain 
a dimension other than continuity which makes it 
different from the other types of high-continuity 
lessons, The signal source here, resting as it does 
on the results of each child's own actions on his 
Own materials, produces a tight, closed behavior- 
environment circuit. This closed circuit insulates 
the lesson and shields each child from foreign in- 
Puts (distractions, other children's deviancies) 
Which may serve as stimuli to inappropriate be- 
havior. Such a format, in addition to continuity, 
Contains a high degree of insulation. This lesson 
type should produce very low nontask scores. i 

Another lesson type requiring special considera- 
tion is the rather common music-and-movement 
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format. These lessons provide continuous signals 
from a single source—records or teachers, However, 
these also contain props (drums, bells) or con- 
stituent actions (dancing, jumping, singing) which 
produce intense stimuli. Since the children are not 
in a closed-circuit system (such as in individual 
construction), these stimuli are likely to intrude 
into participants’ attention; that is, children begin 
to key off one another as well as off the central 
signal. Such lessons contain high éntrusiveness. This 
makes these lessons vulnerable to the spread of 
inappropriate or deviant behavior, thus increasing 
the likelihood of off-task scores relative to the 
other single-emitter systems without intrusiveness, 

A signal system code was developed employing 
the ideas just outlined; all 596 lessons were cate- 
gorized as belonging to one of the six lesson types 
(coder agreement 90%). The types, in abbreviated 
form, are presented in Table 1. The question the 
research effort asked was whether pupil behavior, 
specifically off-task behavior, had a statistically 
significant and psychologically meaningful relation 
to lesson types as these were described in signal 
system terms. 


RrsuLTS 


Since the investigators exercised no con- 
trol over which teachers taught which 
lessons, data were not evenly balanced as 
they might have been in an experiment. 
Teachers did present a range of lessons: 33 
of the 36 teachers taught at least five of the . 
six lesson types. To maintain constant 
teachers and groups, a series of t tests for 
related means was utilized. For example, all 
teachers and groups who operated in at least 
one individual-construction lesson and in at 
least one teacher (or record) story lesson 
were compared. If a teacher taught several 
lessons in any one type, her score was the 
mean off-task score for that type. Since not 
all teachers taught all six types of lessons, 
the Ns and the means will vary slightly for 
each comparison. 

Table 1 displays lesson types and asso- 
ciated data for off-task percentages. The 
means in the column labelled X are for all 
teachers who taught those lessons; however, 
the means in Columns 2 through 6 represent 
means for teachers who taught both of the 
compared lesson types. 

In terms of success in inducing and sup- 
porting appropriate task involvement, the 
lesson types can be grouped into three 
levels. The best was Type 1; it produced 
significantly less off-task behavior than all 
other types. Such individual-construction 
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LESSON SETTINGS AND TASK-RELATED BEHAVIOR 


lessons contained continuous signals from 
the results of each child's own action as well 
as à postulated insulation from external in- 
trusions, apparently a very effective ar- 
rangement for this population. 

The lessons with average success are the 
Type 2 and Type 3 formats. These are in- 
ferior to the individual-construction lessons, 
superior to the Type 4, 5, and 6 lessons, and 
not different from one another. Both Type 
2 and Type 3 possess a postulated high 
degree of continuity and freedom from 
gaps: this is true of books and records be- 
cause of sequenced signals from à con- 
tinually emitting source and it is true of 
teacher-led demonstrations because of clear 
sequencing and continually present central 
Props on which there is a continuous focus. 
It would be reasonable to combine these 
into one type. When this is done, the com- 
bined type is less successful than Type 1 
(p < .001) and more successful than Type 
4 (p < .023), Type 5 (p < .005), and Type 
6 (p < .001). 

The least, successful lessons are Types 4, 
5, and 6. These produce significantly more 
nontask behavior than all the others and are 
hot significantly different from one another. 
Recitations, role play, and group construc- 
tion have laeunae due to the absence of 
continuous Sequencing and/or to their de- 
pendence upon potentially faltering inputs 
from other children. Even though Type 6 
lessons, movement and music, possess a 
Single, continuously emitting source, they 
are vulnerable to inappropriate behavior be- 
Cause the intense props or behaviors in the 
format are potentially intrusive. 

The model presented here, then, suggests 
that the pattern and quality of the signal 
System is a crucial area in predicting child 
involvement in prescribed lessons in pre- 
School. Three dimensions related to signal 
input which are suggested by the present 
Tésearch include (a) continuity of signal 
input, (b) insulation of participants from 
potentially distracting stimuli, and (c) in- 
Irusiveness of respondent action. 


Discussion 


The following paragraphs compare the 
explanatory power and empirical validity 
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of this theory with Some alternative ex- 
planations for the results of this study. 

A prevalent theory to account for chil- 
dren's involvement, in activities is that of a 
simple "pleasure" theory: "children enjoy 
making things" and "they like having 
Stories read to them,” Insofar as these 
particular lessons are concerned, we can 
only point out that most of the children 
looked forward to all of the lessons and ex- 
pected to enjoy all of them, One of our codes 
was designed to categorize the reaction of 
the children to a lesson at the point of their 
first knowing what it was to be about. The 
reaction of each child at the time the lesson 
was just started was categorized (with 90% 
intercoder reliability for a five-category 
code). Out of 3,002 codable, initial child 
reactions, 92% were “enthusiastic” or “posi- 
tive and attentive,” 3% were “partly atten- 
tive,” and 4% were "inattentive" or “nega- 
tive.” It is not surprising, then, that minor 
variations in initial expectations of pleasure 
yielded no significant relationship between 
various measures of initial attraction and 
involvement. 

There is one measure of the pleasure 
manifested by the children in the actual 
lessons. Sherman (1971) analyzed all events 
of “group glee” in the lessons, This was 
evidenced by one half or more of the group 
manifesting glee by jumping merrily, 
Screaming joyfully, or laughing. These glee- 
ful incidents occurred most frequently in the 
motor-expression lessons (Type 6). It is to 
be noted that these lessons were below aver- 
age in task involvement. The evidenced 
pleasure in these lessons does not mean that 
the lesson format induced the children to 
remain with the proper behavior require- 
ment nor that it contained protections from 
inappropriate or deviant behavior. 

While one might argue that signal sys- 
tem variations are related to task involve- 
ment only when some level of attraction is 
present, the data do not support the theory 
that the differences obtained among the 
lesson types are due to differences in the 
attraction of the activities. 

Other theories of activity have to do with 
the kind of lesson activity involved. We 
might point out that in our preliminary 
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coding of activities, we coded phenotypical 
qualities of activities and found significant 
relationships between these and the child 
behavior seores. Among 21 categories of 
activities, we found that lessons dealing 
with construction, stories, Or information 
books had  above-average involvement 
scores; lessons dealing with concepts, dis- 
crimination, or categorizations had average 
scores; lessons dealing with kinesthetics, 
role play, or general talk had below-average 
scores. In a code containing eight categories 
of behavior requirements, we found that 
lessons in which children constructed or 
manipulated props had above-average in- 
volvement scores; lessons in which children 
listened or monitored had average scores; 
and lessons in which children narrated, 
engaged in bodily movements, or imagined 
had below-average scores. Although these 
phenotypical codes possessed predictive 
power, they did not satisfy us but, instead, 
tended to be redundant as well as incon- 
sistent with some of the findings. “Con- 
struction” lessons, for example, were high- 
involvement lessons if they were individual- 
construction lessons but were low-involve- 
ment lessons if they were group-construc- 
tion lessons. Listening or monitoring as a 
behavior mode was associated with high- 
involvement lessons if the source was a book 
or teacher, while listening was associated 
with low involvement if other children were 
the emitters. Similar inconsistencies were 
uncovered when we categorized behavior 
modes as active, passive, or semiactive. 
Listening or monitoring were categorized 
as passive yet were associated with different 
degrees of involvement dependent on what 
was being monitored. Prop manipulation 
might be categorized as semiactive since it 
entailed children “having something to do 
with their hands,” yet these types of be- 
haviors were associated with high involve- 
ment when in an individual-construction 
format and with low involvement when in 
a context of group construction or playing 
noisy instruments. While one might descrip- 
tively categorize the lesson activities on the 
basis of phenotypical content and behavior 
requirements, we feel that the postulated 
signal system model presents a more geno- 
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typical and explanatory theory and resolvi 
some of the inconsistencies noted. 

The Sherman (1971) study of group gl 
may be referred to as adding support to th 
concepts of insulation and intrusiveness; 
Group glee, which occurs either in simuli 
neous bursts or by spreading in à chainli 
fashion from one child to another, does pr 
vide an indication of the amount of chil 
child interdependence occurring in a lesson. 
In our theoretical analysis, we postulated 
that the individual-construction format is 
a closed-circuit system in which a child is 
relatively insulated from inputs from other 
children. We also theorized that the music- 
movement formats possessed potential ing 
trusivenessin which a child’s intense actio 
would be likely to intrude into another. 
child's field of attention. The prevalence of 
group glee provides a measure as to whether 
this, in fact, does happen in these lessons. 
Table 2 shows that the rate of glee in th 
individual-construction lessons is signifi- 
cantly lower than in all other lessons, SUP= 
porting the theory of its highest insulation: 
The rate of glee in the music-movement 
formats is significantly higher than in all 
other lessons, supporting the theory of its 
highest intrusiveness. : 

We deliberately made no attempt, in this 
preliminary analysis of lesson activities, t0 
measure variations in the dimensions of con- 
tinuity, insulation, or intrusiveness within 
any input system. Rather, we are theorizing 
that one can predict to task-related be-- 
havior of a group in a lesson to some degree 
by a simple knowledge of the skeletal for 
mat of a lesson. The results support this 
contention, even though variations 6X! 
among the lessons within any one lesson 
type. 


CONCLUSIONS AND IMPLICATIONS 


The finding that properties of activity 
settings function to mold the behavior 0 
occupants should have implications for te 
search and practice pertaining to teachi i 
Perhaps we should devote less effort to 17 
pursuit, of teacher characteristics am! 
centrate more on what techniques 0% ^ 
quired to program and manage dif 
kinds of activity settings. Different kinds? 


3 


—— nnd 


LESSON SETTINGS AND TASK-RELATED BEHAVIOR 


561 


TABLE 2 
MEAN RATE OF GLEE PER Hour AND DIFFERENCES BETWEEN THE MEAN SCORES FOR THE VARIOUS TYPES 
or LESSONS 


Lesson type 2 3 


= 
P 
a 


1. Signals from effects of 
own behavior on contin- 
uously present materi- 
als (making individual 
constructions). 


2. Sequenced signals from 
a single, continuously 
emitting source (listen- 
ing to teacher or rec- 
ords). 


|» ESE 
2 
eo ^ 


haa S 


3. Teacher paces signals to 
children; also uses con- 
tinuous, external signal 
source (exploring, re- 
acting to demonstra- 
tions). 


4. Recitation: with dis- 
crete, multiple child sig- 
nals (dealing with con- 
cepts, categories, num- 
bers), 


5. Multiple, shifting sig- 
nals primarily from 
child sources (role play- 
ing, general talking, 
group constructing). 


6. Signals from central 
source and inputs from 
high intensity props or 
actions (singing, mov- 
ing body). 


Note. The means in Columns 2 through 6 represent means for teachers who taught both of the com- 
pared lesson types. All p levels (one-tailed tests) are based on the Wilcoxon matched-pairs signed- 


ranks test. 


techniques may be required for different 
kinds of activity settings. 

This viewpoint is supported by the studies 
of Gump (1969) which show differences in 
teacher and child behaviors dependent upon 
differences in classroom activity segments 
and of Kounin (1970) which demonstrate 
differences in the effectiveness of various 
teachers’ managerial techniques dependent 
upon whether they are conducting seatwork 
or recitation settings. 

One might note that the concept of con- 
tinuity of signals may well be related to 
some dimensions of teaching techniques that 
Kounin found related to the task involve- 
ment of elementary school children. Specifi- 


cally, smoothness (non-jerkiness) and mo- 
mentum (absence of slowdowns) were sig- 
nificantly correlated with task involvement 
in recitation formats. Both momentum and 
smoothness may now be interpreted as pro- 
viding lag-free continuity of appropriate 
lesson inputs. 

Preliminary data strongly suggest that 
lesson formats are clearly related to dimen- 
sions of teaching behavior. For example, 
formats with continuous central emitters 
require teaching actions somewhat distinct 
from formats in which there is discontinuous 
central emission; insulated formats neces- 
sitate different actions than formats with 
intrusiveness. Current research is now ex- 
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amining the effectiveness of specific teacher 
action in relation to the general format in 


which it occurs. 
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INTELLIGENCE AND TRANSFER: 
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The subjects’ ability to learn to solve letter-series problems was in- 
vestigated with low-transfer versus high-transfer practice and in 
pretest versus no-pretest conditions. Aptitude by treatment interac- 
tions were found with Otis IQ and the use of a pretest. Type of practice 
interacted with Raven IQ. The results were interpreted in terms of 
Ferguson's, Cattell’s, and Jensen’s models of the relationship between 
intelligence and learning. Low-IQ subjects do benefit from direct prac- 


Ferguson (1954, 1956) has proposed that 
general intelligence is correlated with posi- 
tive transfer such that subjects with high 
scores on intelligence tests are expected both 
to profit more from practice on an intellec- 
tual task and to show greater improvement 
on a related task than low-scoring subjects. 
This general prediction was confirmed by 

^ Sullivan and Skanes (1971), who found that 
high-scoring (bright) subjects had higher 
transfer scores than low-scoring (dull) sub- 
jects of similar mental age on a letter-series 
reasoning task following relevant practice. 
They also noted that the performance of 
bright subjects was maximized when a pre- 
test was given before the practice session, 
while dull subjects exhibited their best 
performance without the pretest, suggesting 
the presence of an aptitude by treatment 
interaction (ATI) or an aptitude by instruc- 
tion interaction (AII). Cronbach and Snow 
(1969) state that “An ATI exists, in effect, 
when the regression outcome under treat- 


* This work was done under Grant D-6 from the 
National Research Council of Canada. Thanks are 
due Marilyn Tuck who adapted Dowaliby and 
Berliner's (1971) ANALATI program for our use. 
i "Requests for reprints should be sent to 

Graham R. Skanes, Department of Psychology, 
Memorial University of Newfoundland, St. John's, 
Newfoundland, Canada AIC 587. 

* ATI is the term used by Cronbach and Snow 
(1969), while Jensen (1970) used AII. Recent 
Teviews of the ATI literature have been provided 
by Cronbach and Snow (1969) and Bracht (1970a, 

0b). 


tice on the strategies involved in the solution of complex problems. 


ment A, upon certain pretreatment informa- 
tion, differs in slope from the regression of 
the same variables under treatment B [p. 
4]." In other words, if the functional rela- 
tionship between learning and intelligence 
differs under different treatment conditions, 
this constitutes an aptitude by treatment 
interaction. In Sullivan and Skanes’ (1971) 
case, it is impossible to speak of regression 
slopes or functional relationships, since only 
two extreme groups on the pretreatment 
variable were used. Nevertheless, when one 
group achieves optimum learning with one 
treatment and a second group, differing on 
intelligence, reveals a maximum with the 
other treatment, it is reasonable to suspect 
the existence of different slopes. 

The purpose of this study was to deter- 
mine if different treatments lead to different 
outcomes for subjects varying on intelli- 
gence, as measured by standard intelligence 
tests. The different treatments were created 
in two ways: (a) by using different types 
of practice material, one directly related to 
the learning task (high transfer) and the 
other less so (low transfer), and (b) by 
using different procedures, one with a pre- 
test condition and one without (cf. Sullivan 
& Skanes, 1971). 


METHOD 


Subjects 


The subjects were 2,097 school children from 
Grades 5 to 9 in 13 schools in Corner Brook, New- 
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foundland. They ranged in age from 9 years 9 
months to 18 years 8 months. 
Materials 
Intelligence tests. The Raven Progressive 


Matrices (Raven, 1938) and the Otis Quick-Scoring 
Mental Ability Test, Form CM (Otis, 1954) were 
administered to all subjects. 

Learning tests. The learning tests consisted of 
two forms (A and B) of a letter-series test similar 
to that in the Reasoning subtest of the Tests of 
Primary Mental Abilities (Thurstone, 1938). Each 
item of the 20-item test consisted of a series of 
letters which formed a logical sequence (see 
Sullivan & Skanes, 1971). The subject was required 
to figure out the sequences and to provide the next 
two letters in the series for each item, thus giving 
a maximum possible score of 40. 

Practice. Two forms of practice were used. The 
first (number or indirect practice) consisted of 40 
items, each of which was a number-series problem 
(ie. a series of numbers arranged in a logical 
sequence). The second (letter or direct practice) 
consisted of 20 alternating number-series and 
letter-series problems such that the odd-numbered 
items were number series and the even-numbered 
items were letter series. The odd-numbered items 
on both the number practice and the letter practice 
were identical, and the even-numbered problems 
on the number practice were equivalent to the 
even-numbered problems on the letter practice in 
bate of sequence presented and method of solu- 

1on. 


Procedure 


Both the Raven Progressive Matrices and the 
Otis. Quick-Scoring Mental Ability Test were 
administered with 30-minute time limits to sub- 
jects in their regular classroom groupings. Class- 
rooms (78 in all) were then randomly assigned to 
the five treatment conditions with the constraint 
that the number of classrooms from Grades 5, 6, 7, 
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8, and 9, respectively, be the same in each treai 


administered, followed in order by & 5-minute 
break, the number-practice session, another 5- 
minute break, and finally, Test B. (c) N-T. These 
subjects had no pretest. They started with theii 
number practice, had a 5-minute break, and then’ 
completed Test B. (d) T-L-T. Here the procezi 
dure was the same as in condition T-N-T excep 
that the letter practice was substituted. (e) L-Ty 
These subjects were treated in the same way asi 
those in condition N-T except that the letter 
practice was used. 
The procedure followed during the practice 
sessions was the same for the four experimental 
groups. The experimenter solved the first item, 
explaining the principle used, and had the subject 
write down the correct answer 1n the spaces: 
provided in the practice form. The subjects then 
attempted the second problem on their own. This 
procedure was followed for all of the subsequent 
items: The experimenter solved and explained 
the odd-numbered items, and the subjects come) 
pleted the even-numbered ones. Thus, groups 
T-N-T and N-T had identical practice sessions, a8 
did groups T-L-T and L-T. Groups T-N-T and 
N-T differed from groups T-L-T and L-T in that 
the items completed by the subjects were either 
number-series items (T-N-T and N-T) or letter- $ 
series items (T-L-T and L-T) which paralleled the 
number-series item just solved by the expen 
menter. Each practice session took about 20 
minutes. Any questions raised by the subjects were 
answered, so that the session was as muc 
regular classroom instruction as possible. 


RESULTS 


Some characteristics of the five treatment 


groups are given in Table 1. A multiple 


TABLE 1 
Some CHARACTERISTICS OF THE TREATMENT GROUPS 
T-T “No = s T 
iae ea T-N-T N-T T-L-T E 
x sD x SD x SD x SD x |g 
m 
Test B 8.1| 8.3| 12.7 | 9.6 be 
: : D | 11.3] 7.5| 14.2| 9.0| 13.3 
Age (months) 155.8 | 22.7 | 159.7 | 22.1 | 157.9 | 21.2 | 156.5 | 20.9 | 158.8 55 
Tanen IQ 100.0 | 16.1 | 100.0 | 14.6 | 99.5 | 13.1 | 100.9 | 16.0 | 102-1 155 
ou 98.9 | 13.6 | 100.8 | 13.1 | 99.7 | 12.0 | 101.5 | 11.8 | 101.3 | 94 
rade 66! 14| 71| 14| esl 13, 7.0! 14| 7-0! P 
% Females 52 51 4 
. z .96 .56 ; 
a 406 409 426 413 bs 


Note. Abbreviations: T-T = Test A - Test B; T-N-T = T i iN 
^ H = Test A- be: tice - Test B; 
iater practice - Test B; T-L-T = Test A - letter practice — Test d Lre letter practice = 


EC 
Tel ^ 
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TABLE 2 
MuvriPLE REGRESSION ANALYSIS CONTROL 
versus Practice GROUPS UsiNG Test B 
AS THE DEPENDENT VARIABLE 


Variable cr cg aeos 


Practice - no practice (P) 


.0267 | 111.39** 
Grade (G) -0184 76.57** 
Sex (S) -0127 52.93** 
Age (A) .0010 4.04* 
Otis IQ (0) .0388 | 161.25** 
Raven IQ (R) .0682 | 283.92** 
Interactions (P X G, P x S, 

PXA,PX0O,&P x R) .0015 1.26 


Note, n — 2,097. 

* R? (full model) = .4994. 

^ df = 1/2,086 in all cases except Interactions, 
where df = 5/2,086. 

* p < .05. 

** Dice d 


regression analysis (Cohen, 1968; Overall & 
Spiegel, 1969) was performed to assess the 
effects of the practice using Test B as the 
dependent variable; that is, the control 
group (T-T) was compared with all the 
practice groups combined. The full model 
consisted of the main effects (practice 
versus no practice, grade, sex, age, Otis IQ, 
and Raven IQ) and the interactions of the 
practice—no-practice variable with the other 
main effects. The results are given in Table 
2. All main effects accounted for significant 
portions of the Test B variance. The propor- 
tion of variance predieted by all the inter- 
actions combined was not significant. There- 
fore, the practice had the overall effect of 
increasing Test B performance, but it did 
not do so differentially along any of the 
other main variables, that is, grade, sex, age, 
Otis IQ, or Raven IQ. 

When the control group is excluded, the 
praetice groups form a 2 X 2 factorial design 
with pretest — no pretest as one factor and 
type of practice (number versus letter) as 
the second. A multiple regression analysis 
(Overall & Spiegel, 1969, Method 2) was 
performed.* The full model included all the 
main effects and the interactions of the 
two treatment factors with each of the 


“Correlation matrices for each grade level may 
be obtained from the first author on request. 
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other main effects. Table 3 gives the results 
of the analysis. There was no overall pre- 
test effect. All the other main effects were 
significant predictors of Test B score, The 
two interactions listed in Table 3 were the 
only two second-order interactions involving 
the two treatment factors which were sig- 
nificant. The pretest-no-pretest variable 
interacted significantly with Otis IQ, and 
type of practice interacted with Raven IQ. 

The two significant interactions were fur- 
ther analyzed to see at what value of the 
predietor variable significant differences 
would be obtained between the treatments. 
The Johnson-Neyman method was used 
(Johnson & Neyman, 1936; Walker & 
Lev, 1953) in a computer program written 
by Dowaliby and Berliner (1971). That 
program includes the Potthoff criterion 
(Potthoff, 1964), a more conservative test 
of the interactions than the orginal John- 
son-Neyman criterion (Cahen & Linn, 
1971). 

The regression lines for the Pretest x Otis 
IQ interaction are given in Figure 1. The 
values of Otis IQ, at which significant dif- 
ferences between the pretest and no-pretest 
conditions occur, are given. The proportion 
of cases in each region of the graph is also 
given. Therefore, at the .05 level of sig- 
nificance, children whose Otis IQs are above 
100 (about 75% of the cases) do signifi- 
cantly better with the pretest than without. 


TABLE 3 


Muttiete REGRESSION ANALYSIS OF TREATMENT 
Gnours Using Test B As THE DEPENDENT 


VARIABLE 
Variable iB 

Pretest — no pretest (P) .0011 3.07 
Type of practice (T) .0036 12.18* 
Grade -0163 55.58* 
Sex .0132 44.98* 
Age .0020 6.93* 
Otis IQ (0) .0480 | 166.74* 
Raven IQ (R) .0781 | 265.95* 
PXO .0022 7.05* 
TXR .0021 7.27* 

Note. n = 1,691. 

a R? (full model) = .5089. 

> df = 1/1673. 


*p< 0l. 
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Otis IQ 
Fioure 1. The Otis IQ X Pretest-No-Pretest 
interaction. 


The reverse is true for children whose IQs 
are less than 84 (about 8% of the cases). 
The regression lines making up the Raven 
x Type of Practice interaction are given 
in Figure 2. At the .05 level, for children 
with Raven IQs less than 104 (60% of the 
cases), the direct practice yielded higher 
Test B scores; while for children with 
Raven IQs above 133 (about 1% of the 
cases), the indirect practice proved superior. 


Discussion 


The experiment reported here has demon- 
strated aptitude by treatment interactions 
using standard intelligence tests as aptitude 
measures. Low transfer versus high transfer 
and the use of a pretest as opposed to no 
pretest were the treatments. Cronbach and 
Snow (1969) have suggested that broad 
measures of aptitude such as IQ would be 
the best place to look for aptitude by treat- 
ment interactions, and the results obtained 
here support their view. 

The use of a pretest. Campbell and 
Stanley (1963) have suggested several pos- 
sible effects of pretesting in experimental 
methodology. The usual effect is that pre- 
testing leads to improved performance on 
retest. The results reported here demonstrate 
that the effect of a pretest is a function of 
the intelligence of the child. Therefore, it is 
possible that in some experiments a pre- 
test might obscure treatment effects in sub- 
jects with certain characteristics. The in- 
discriminant use of pretests in experiments 
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might obscure possible treatment effects 
because the pretest interacts with the treat 
ment to produce a decrement in perform 


ance. 

Samuels (1969) and Hartley (1971) have 
however, supported the view that a pres 
test enhances performance after practi 
Samuels found that pretests with feedba 
resulted in greater retention of the mai 
rial read. Hartley, using programmed in- 
struction, found that the pretest has nol 
appreciable effect if learning is efficient but 
that it facilitates performance when the task 
is difficult. Sullivan and Skanes (1971 
added that the pretest effect is a function 
of the ability of the learner. The aptitude 
by treatment interaction reported here 
strengthens that view because the pre 
effect is found to be true when the practice 
used is direct or indirect. The effect, appears 
when aptitude is measured by the Otis IQ 
test but not when it is measured by the 
Raven. For high-Otis-IQ subjects, the pre 
test resulted in higher performances than no 
pretest. For low-Otis-IQ subjects, the no 
pretest condition was superior to the pretest 
condition. 

Little can be said for certain here about 
the cause of the pretest effect. It might be 
motivational, with lower-IQ subjects be- 
coming discouraged by the difficult ài 
probably meaningless problems with which 
they are faced. Hartley (1971) preferred an 
explanation in terms of tuning. He suggest à 
that the pretest will “not only . . . alter one’s 
expectations to what is required, but.. 


Test B Score 


_ Ficure 2. The Raven IQ X Ty: 
interaction. 
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scems to assist in the organization of other 
related material so that it is easily remem- 
bered [p. 143]." 

The task used here appears to require 
of the subject that he formulate hy- 
potheses about the solution of the problem 
encountered (e.g. “two ascending series 
interposed"). He then has to check the 
hypothesis against the material in view and, 
if it fits, extend it. That process is akin to. 
"formal operational reasoning," the highest 
stage of intellectual development in Piaget's 
system. The pretest gives subjects an op- 
portunity to see what the problem is and 
what possible solutions there are. High-IQ 
subjects can formulate and test hypotheses 
during the pretest; low-IQ subjects probably 
cannot. The practice session for high-IQ 
subjects is a session in which already formu- 
lated notions about strategies are honed and 
sharpened, incorrect ones rejected, and new 
leads formed. The practice session for the 
low-IQ subjects is imposed on a muddled 
background and probably serves to consoli- 
date confusion. 

Differential transfer conditions. It is in- 
teresting from the point of view of Fergu- 
son's (1954, 1956) theory of transfer and 
intelligence that the Raven IQ, generally 
ngreed to be the best single measure of 
Eeneral intelligenee in Spearman's sense, 
yielded an interaetion with the different 
transfer treatments but that the Otis IQ did 
not. Ferguson has suggested that general 
intelligence is positively related to the 
ability to transfer. The fact that the regres- 
sion line relating learning to Raven IQ in 
the indirect or low-transfer condition was 
significantly steeper than in the direct or 
high-transfer case indicates that Ferguson 
is correct. Subjects whose level of general 
intelligence was high used the difficult trans- 
fer situation to advantage, while the low-IQ 
subjects were unable to do so and profited 
more from the direct training. i 

The aptitude by treatment interactions 
found in this study are important in view of 
the current controversy concerning the 
relevance of genetic or constitutional factors 
in IQ and learning. Two related but sepa- 
rate propositions have been espoused: first, 
by Cattell and his associates (Cattell, 1963; 
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Horn, 1968) and, second, by Jensen (1969a, 
1969b, 1970). 

In the Cattell (1963) model, there are 
two types of intelligence, fluid and erystal- 
lized. The former is much like ability to 
transfer, while the latter resembles the 
ability to acquire information, Horn (1968), 
in reviewing the characteristics of various 
tasks, has noted that the Raven is a good 
measure of fluid intelligence and that such 
series tests as the ones used here for the 
learning task are also good measures of 
fluid ability. Fluid ability in the Cattell view 
is highly constitutional in nature, and in 
Horn's scheme, it is related to neurological 
functioning. 

Our results can be interpreted in the fol- 
lowing way using Cattell’s model. The 
direct practice (high-transfer) condition is 
analogous to a situation in which crystal- 
lized ability is useful. The indirect practice 
(low-transfer) condition is one in which 
fluid intelligence would be needed. The slope 
of the regression line in the crystallized case 
was flatter than in the fluid, confirming the 
idea that the low-transfer situation was 
more directly related to an accepted mea- 
sure of fluid intelligence than was the other, 

In a similar way, Jensen’s (1969a, 1969b, 
1970) model can be used as a framework 
onto which the results can be fitted. Jen- 
sen has proposed that some complex tasks 
(tasks requiring reasoning as opposed to 
rote learning) are relatively free of cultural 
factors and, hence, are highly heritable. The 
Raven Progressive Matrices is one such 
task. We can conclude, in Jensen’s terms, 
then, that the direct practice condition is 
not as highly dependent upon inherited 
ability as the indirect practice condition be- 
cause the former is less directly related to 
Raven IQ. 

The data, therefore, support the Cattell 
and Jensen views. But more important, they 
show a way around the conclusions that 


` Cattell and Jensen appear to formulate; 


namely, that people of low fluid intelligence 
or low, culture-free, complex learning ability 
might be unable, because the factors are 
inherited or constitutional, to do complex 
tasks. Our data show that when a complex 
task is presented in a direct manner, sub- 
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stantial improvement in performance is 
possible. 

Guinagh (1971) has done something simi- 
lar. He trained children on the strategies 
relevant to solving the problems in the 
Raven Progressive Matrices. Even groups 
whose performance would be predicted to 
remain low because of presumed genetic im- 
poverishment, showed substantial improve- 
ment. In other words, direct training on the 
strategies to be employed to do complex 
tasks leads to increased performance by 
subjects who would not be expected to per- 
form well. Or, strategies can be learned by 
rote and applied to problems in much the 
same way as if the strategies were derived 
by the subject. 

The conclusion from this study and from 
Guinagh's (1971) study is that the effort 
to make tests culture fair by choosing novel 
material (e.g., Raven’s matrices) or com- 
pletely familiar material (e.g., the alpha- 
bet; Horn, 1968) did not succeed because 
test makers neglected to ensure that the 
relevant strategies were equally novel or 
equally familiar. While the evidence pre- 
sented here in no way suggests an elimina- 
tion of intelligence factors in learning, it 
does point the way to a modification of the 
extreme interpretation of Jensen’s work. 
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COGNITIVE STYLE AND REASONING ABOUT SPEED 
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College students were given a test for field dependency, and they were 
asked to reason about the relative speeds of horses turning on a merry- 
go-round platform. To promote the recognition that horses located on 
the outside of the platform traveled faster than horses on the inside, 
additional problems were presented which directed subjects’ attention 
to the relevant variables. Results revealed that unlike field independ- 
ent subjects who reasoned correctly from the outset, field dependent 
subjects failed to think analytically, They were misled by perceptually 
salient aspects of the situation; they tended to center on these features 
in their reasoning; and they resisted accommodating to additional 


information. 


Variations in cognitive styles have been 
of interest to investigators as a means of 
accounting for individual differences in per- 
formance on perceptual and conceptual 
tasks. Not only Witkin and his associates 
(Witkin, Dyk, Fattuson, Goodenough, & 
Karp, 1962; Witkin, Lewis, Hertzman, 
Machover, Meissner, & Wapner, 1954) but 
also Kagan, Moss, and Sigel (1963) have 
produced evidence suggesting that two op- 
posing modes of thinking can be identified: 
(a) analytic or field independent, the ability 
to distinguish, hold in mind, and coordinate 
the relevant figures or attributes extracted 
Írom a more complex and sometimes dis- 
tracting stimulus context, and (b) global or 
field dependent, a more passive, unanalyzed 
processing of the stimulus in which its holis- 
tie nature is preserved and only the most 
obvious relations among its parts are manip- 
ulated mentally. This distinction has been 
incorporated in a performance model of cog- 
nitive development proposed by Pascual- 
Leone (1970). He has suggested that Piaget’s 
theory of intellectual functioning be ex- 
tended to inelude a central processor or 
computing space whose function it is to 
transform and coordinate information and 
Whose capacity expands as the child grows. 
And he has reported evidence indicating that 
field dependent subjects, unlike field inde- 
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pendent subjects, often fail to utilize the full 
structural capacities of their central proces- 
sors. That is, they do not extract and 
mentally manipulate the maximum number 
of independent pieces of information which 
their computing spaces are capable of 
handling. As a consequence, these subjects 
display reasoning patterns similar to those 
of younger children observed by Piaget and 
Inhelder (1941). 

The relevance of cognitive styles and Pas- 
cual-Leone’s model of information process- 
ing became of special interest to the present 
authors as a means of accounting for some 
puzzling differences observed in the thinking 
of college students. In response to the ques- 
tion, Would a child riding on the outside row 
of horses attached to a moving merry-go- 
round platform be traveling faster than a 
child circling on the inside row of horses or 
would they be moving at the same speed?, 
some people argued that the outside moved 
faster, but an equal number insisted that 
both moved at the same speed, and a few 
claimed that the inside traveled faster. 
Analysis of the reasons presented in support 
of these positions pointed to the possibility 
that patterns might be reflecting differences 
in the general style of thinking adopted by 
adults to solve problems. To determine 
whether alternative solutions to the moving 
merry-go-round problem might be related to 
differences in field dependency among 
adults, the following study was undertaken. 
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METHOD 


Four verbally stated problems were constructed. 
The first instructed subjects to think of a merry- 
go-round which has two circles of horses. One cir- 
cle is on the outside of the platform. The other 
circle is on the inside. Billy selects a horse on the 
outside to ride. Danny picks a horse on the in- 
side. The merry-go-round begins turning. Is one 
boy going faster than the other boy or are they 
both going the same speed? The subjects were re- 
quired to check one of the three alternatives (i.e., 
outside faster, inside faster, same speed) and then 
to explain their choice. Three problems designed 
to assist the subjects in focusing upon relevant as- 
pects of the moving merry-go-round problem were 
subsequently presented. These required subjects to 
notice differences in distance traveled by tracing 
the routes of the horses, to think about an analo- 
gous situation in which children hold hands and 
run around a post (ie. Do some children run 
faster?), and to consider the formula for speed. 
Following each of these problems, the subjects 
were asked to reconsider their answers to the mov- 
ing merry-go-round problem and to decide whether 
they still believed their answers to be true or 
whether they wished to alter their positions. 

Field dependence was measured with an em- 
bedded figures test, the Components subtest (Flan- 
agan Aptitude Classification Tests, 1958). It con- 
sisted of two parts, each part requiring the subjects 
to identify which of 5 simple geometric figures was 
embedded in each of 20 very complex figures. 

All tasks were printed and administered to 
groups of students (n — 61) enrolled at the Uni- 
versity of California, Davis. The subjects provided 
written answers. 


TABLE 1 


Resuuts OF THE SERIES or Movine Merry-Go- 
RouNp PROBLEMS 


Moving merry-go-round problems 


embeided guses test 
gures test" | Solved 
Solved | Ni 
ee by end | solved | Total 
Field dependent 
(below 20) 1 3 1 15 
Middle range 
(20-30) 9 12 8 29 
Field independent. 
(above 30) 11 4 2 17 
Total 21 19 21 6l 


Note. Results indicate the number of field inde- 
pendent and field dependent subjects who con- 
cluded or never concluded that the outside child 
was moving faster in the series of moving merry- 
go-round problems. 

a The scores are those obtained on the Compos- 
ite subtest of the Flanagan Aptitude Classification 
Test. 
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RESULTS 


Performances on the four problems we 
quantified. The subjects were divided into 
three groups: (a) those who concluded from 
from the beginning that the outside child 
was going faster (n = 21); (b) those who 
shifted to this position by the end of the 
problems (n = 19); (c) those who neve 
arrived at this conclusion (n = 21). A score 
of 1, 2, or 3 was assigned to each subject 
aceording to his group membership, and 
these values were correlated with scores on 
the embedded figures test. Results revealed 
that field dependence was indeed related to 
performance on the moving merry-go-round 
problem (r = —.49, p < .01). This relation- 
ship was verified in another way (see Table 
1). A chi-square test revealed that the dis- 
tributions of field dependent and field inde- 
pendent thinkers in the three groups 
were distinctly different (y? = 14.58, 
df = 2, p < .01). The majority of field 
independent subjects tended to reason from 
the beginning that the outside child was 
traveling faster. In contrast, most field de- 
pendent subjects neither proposed this solu- 
tion initially nor arrived at the “outside” 
answer by the end of the problem set. Al- 
though the additional problems were suc- 
cessful in altering the views of 19 subjects, 
few of these (only 3 subjects) were classe 
as field dependent while the majority fell in 
the middle range on the field dependency 
measure. 

Of the 21 subjects who reasoned correctly 
from the outset, 17 provided complete ex- 
planations citing the greater distance 
covered in the same amount of time or IM 
one revolution by the outside child. The re- 
mainder either mentioned only one of these 
variables explicitly or reasoned by à crack- 
the-whip analogy. 

Examination of the reasoning patterns of 
subjects who persisted throughout the pro 
lems in arguing that the boys were g0! 
the same speed revealed the following justifi- 
cations: the platform consists of one ¢ 
turning at one, not two speeds; the platform 
is the only thing moving, not the ho 
which are attached to it; there exists 0 y 
one belt or engine generating one, not two 
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movements. Although these subjects saw the 
outside child as moving faster in the post 
problem, they all regarded this as unrelated 
to the moving merry-go-round problem for 
reasons which focused upon unanalyzed 
physical differences in the two situations 
(i.e., “the children are running, the horses 
are not"). None of the subjects distinguished 
between linear and rotational notions of 
speed to resolve the discrepancy. Such rea- 
soning provides a clear illustration of field 
dependent thinking in a context which is 
quite different from the embedded figures 
test. 

Although the majority of the faulty argu- 
ments favored "same speed," there were 
some subjects who mentioned that the inside 
child might be moving faster. In their ex- 
planations, these subjects appeared to forget 
that the horses remained evenly positioned 
on the platform. Noticing that the cireum- 
ferences of the paths differed, they decided 
that the inside child could not be going 
slower and might even be circling faster 
since he had less far to go than the outside 
child. It was observed, in fact, that the 
route-tracing problem which focused at- 
tention on these differences in distance had 
the effect of strengthening this line of rea- 
soning in seven cases. 


Discussion 


Reasoning exposed in the set of moving 
merry-go-round problems was successful in 
providing evidence consistent both with 
cognitive style distinctions and with Pas- 
cual-Leone’s (1970) information-processing 
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model. Whereas field independent subjects 
were able to analyze the stimulus contexts, 
extract the relevant variables, and coordi- 
nate them appropriately, the thinking of 
field dependent subjects tended to be domi- 
nated by the perceptible physical properties 
of the total stimulus configuration, and these 
subjects were most resistant to the influence 
of prompts hinting at another line of reason- 
ing. Of course, given that a global, intuitive 
interpretation lacks clear structuring among 
parts, it is perhaps not surprising that such 
a product is less amenable to either slight 
or substantial modification in the face of 
additional information. 
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The purpose of this study is to use some techniques developed for de- 
scribing the “environments” of U.S. universities to explore the correla- 


tion between national characteristics and university characteristics | 
in the British Commonwealth. Because such techniques have been lit- 
tle used outside the United States, the study first examines their ap- 
propriateness for describing Commonwealth universities, Finally, the 
study uses one of these techniques to examine the collegial organiza- 
tion of Cambridge University and Oxford University, England. Re- 
sults for 186 universities suggest that these techniques are appropri- 
ate for characterizing Commonwealth universities, including Oxford 
and Cambridge colleges, and that national characteristics and univer- 
sity environments correlate fairly meaningfully in the Commonwealth, 
Therefore, this study helps provide a broader, more internat ionalveon- 
text for studying university environments. 


Universities have goals and traditions 
that transcend differences among nations; 
therefore, an interesting question is whether 
nations with different characteristics also 
have universities with different character- 
istics. Techniques for describing how col- 
leges and universities differ have been 
developed by psychologists and other be- 
havioral scientists in the United States 
(Astin, 1962, 1968; Astin & Holland, 1961; 
Pace, 1963; Richards, Bulkeley, & Richards, 
1972; Richards, Seligman, & Jones, 1970; 
Stern, 1970), but only a few studies 
(Richards, 1973; Richards, Rand, & Rand, 
1968) have used these techniques to study 
"institutions outside the United States, The 
purpose of the present study is to use some 
of these techniques to explore the extent 
to which national characteristics and uni- 
versity characteristics (or university “en- 
vironments”) correlate in the British Com- 
monwealth. Because such techniques have 
been little used outside the United States, 
the study first examines evidence of the ap- 
propriateness of these techniques for char- 
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acterizing British Commonwealth univer: 
sities, Finally, the study uses one of these 
techniques to examine the collegial organi- 
zation of Oxford University and Cambridg 
University, England. Holland's (1973). 
theory of interpersonal environments pro- 
vides an overall framework for much of i: 
investigation. 


METHOD | 


Procedure 


Sample of universities. The source of data fd 
this study was the Commonwealth Universiti | 
Yearbook 1969 (Association of Common ded 
Universities, 1969). This compendium 1mé iod 
data for member institutions of the Mew 
plus universities in Ireland and South ADOS 
and presents detailed information for all p 5 
universities except those established mos! 
cently. Li NE S 

The sample consisted of 186 universities M 
complete data. Whenever the Yearbook anne 
mitted, geographically separated campuses a 
a common administration were treated a8 pus 
institutions. For example, the three E maica, 
the University of the West Indies (in ately 
Trinidad, and Barbados) were treated separ! 


; A 
* Inclusion of universities from a given ni 
of course, implies neither approval nor disap that 
of the social and political organization 
nation. 

ar 


The major problems, and perhaps the major 
biases, for this study occurred in India and 
Pakistan, where most universities are "affiliative" 
examining bodies with little if any centralized 
faculty and instruction. This study includes only 
those universities in these two countries that ap- 
peared to be "unitary teaching" universities, either 
by official designation or in terms of the data 
presented in the Yearbook. The examining-body 
orm of university organization is modeled on 
the University of London, England, and that 
'university too was excluded from this study to 
reduce differential bias among nations. For analy- 
ses at the university level, Oxford University and 
Cambridge University, England, were treated as 
unitary institutions despite their collegial orga- 
nization. 

It is arguable whether this group of universities 
constitutes a sample or a population (and there- 
fore whether it is meaningful to use tests of sta- 
tistical significance). In a strict statistical sense, 
this group seems more nearly a population, but 
this study is of less interest if one does not make 
some sort of generalization beyond these particular 
universities. Also, some readers may choose to re- 
gard these universities as a sample and wish to 
know the statistical significance of the results. Ac- 
cordingly, significance levels are reported where 
computable. 

Environmental measures. Two sets of environ- 
mental measures were used in this study. The first 
grows out of Holland's (1973) theory that people 
in various occupations have characteristic per- 
sonalities, either because people with differing 
personalities are attracted to characteristic occu- 
pations or because differing occupations mold 
personality in characteristic ways. Specifically, 
one’s occupation corresponds to one’s personality 
type—realistic, investigative, artistic, social, enter- 
prising, or conventional. It follows that an inter- 
personal environment is a function of the number, 
or relative number, of people with these personal- 
ity types composing that environment. The basic 
procedure for assessing an environment, there- 
fore, is simply to count the number of people in 
that environment whose occupations fall into each 
of the six types. Usually some control for the size 
of- the environment (ie. the total number of 
| people) is necessary, such as computing the per- 
centage of the people who fall into the six types. 
Variations of this procedure have been successfully 
applied to measuring college and university en- 
vironments in the United States (Astin, 1965; 
Astin & Holland, 1961; Richards et al., 1970, 1972) 
nd in Japan (Richards, 1973). 

In the present study, the number of faculty 
members at each Commonwealth university fall- 
ing into each type was determined. To reduce 
skewness, a square root transformation was used 
for most analyses involving the number of faculty 
falling into the six types. Assignment to types was 
based on the specific academic discipline cited for 
each individual faculty member in the Yearbook. 
Disciplines were assigned to types on the basis of 
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several empirical classifieations for occupations 
and major fields developed by Holland and his 
associates (Holland, 1966; Holland, Viernstein, 
Kuo, Karweit, & Blum, 1970; Viernstein, 1971). 
Realistic fields are typified by agriculture and 
civil engineering; investigative fields by biology 
and physics; artistic fields by literature and foreign 
languages; social fields by education and sociol- 
ogy; enterprising fields by economics and law; 
and conventional fields by accounting. The num- 
ber of faculty in a given type not only measures 
absolute emphasis on that type but also strongly 
involves university size. The percentage of faculty 
in each type at each university was also com- 
puted to measure relative emphasis on the six 
types. These percentages were transformed sepa- 
rately for each type to normalized standard scores 
by the percentile procedure. While this trans- 
formation did succeed in making the scores more 
nearly normal, its major effect was to place them 
on a scale more suitable for correlational analysis. 
Regardless of the transformation, of course, the 
percentage scores are ipsative in that a university 
with a high score on one type must have low 
scores on the other types. Therefore, statistical 
tests involving the six distributions of percentages 
are not completely independent. 

The second set of environmental measures used 
in this study are based on factor-analytic investi- 
gations of the environments of two-year (Rich- 
ards, Rand, & Rand, 1966) and four-year (Astin, 
1962) institutions in the United States and of uni- 
versities in Japan (Richards, 1973). Three factors 
appeared common to all of these studies. ‘The 
first was simply institutional size. The second was 
affluence in the sense of having a large budget 
and extensive facilities (e.g., library, faculty, ete.) 
relative to the number of students. The third 
might best-be termed technological emphasis. 
High-scoring institutions offer extensive training 
in engineering, have a large percentage of males in 
their student body, and are characterized by secu- 
lar (rather than religious) control or orientation. 
Because comparable results were obtained in the 
United States and Japan, it appears that these 
three dimensions may be appropriate for uni- 
versities in many nations. $ 

Accordingly, measures of size, affluence, and 
technological emphasis were developed for each 
university? Size was measured by the total num- 
ber of full-time students. A square root trans- 
formation was made to reduce skewness. The mea- 
sure of affluence involved two variables: (a) uni- 
versity income per full-time student (in Canadian 
dollars at 1969 exchange rates) and (b) the num- 
ber of library books per full-time student. These 
variables were equated for mean and standard 
deviation and combined with unit weight. Tech- 


It appeared more efficient, and more appro- 
priate to the Yearbook data, to calculate these 
measures directly rather than first conducting 
a similar factor-analytic investigation of Common- 
wealth universities. 
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nological emphasis was also measured by a unit- 
weighted combination of two equated variables 
chosen on the basis of empirical results from 
previous studies (Astin, 1962; Richards, 1973). To 
avoid overlap with the realistic type score, no 
direct measure of technological education was in- 
cluded. The first variable was the percentage of 
males in the full-time student body, and the sec- 
ond was a measure of secular orientation in which 
(prior to the equating transformation) universities 
under religious control scored 1, secular univer- 
sities offering training in religious subjects scored 
2, and secular universities without religious train- 
ing scored 3. Last, the distributions on size, wealth, 
and iare emphasis were converted sepa- 
rately to (unnormalized) standard scores (M = 
50, SD = 10). 

All of these variables represent nonreactive 
measures (Webb, Campbell, Schwartz, & Se- 
chrest, 1966) of the “objective” university en- 
vironment. In the United States, such measures 
have been shown to be correlated with reactive 
measures of the psychological, or phenomenologi- 
cal, university environment (Astin, 1963; Richards 
et al, 1970), but it has not been determined 
whether this is true in the British Commonwealth. 
It is also undetermined whether Holland's no- 
tions about the relationship between occupation 
and personality apply in the Commonwealth, al- 
though there is some evidence (Lonner, 1968) that 
vocational interests and some occupations are 
similarly related in several non-Commonwealth 
nations, Finally, this study represents a *man from 
Mar” point of view (Townsend, 1971) in that the 
investigator has no personal experience with any 
Commonwealth university. Therefore, this first 
study of Commonwealth university environments 
should be interpreted with restraint. 


Resuits 


Table 1 summarizes the central tendencies 
and variabilities of the type scores. A 
marginal totals chi-square—based on the 


TABLE 1 


Measures OF CENTRAL TENDENCY AND 
VARIABILITY FOR FacuLTY Types 


No, facul 
“Wa | ray 
Type scores 

inter 
M SD Median rti 
range 
Realistic 5.15 | 3.37 | 7.96 | 6.29 
Investigative 11.29 | 5.80 | 42.14 | 9.67 
Artistic 6.87 | 3.38 | 17.17 | 6.42 
Social 6.63 | 5.86 | 14.86 | 6.54 
Enterprising 4.94 | 2.83 8.71 | 3.51 
Conventional 1.03 | 1.22 0.13 | 0.40 
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TABLE 2 
CORRELATIONS OF Type Scores with Sizi 
AFFLUENCE, AND TECHNOLOGICAL EMPHASES 
FOR ALL COMMONWEALTH UNIVERSITIES 


COMBINED 
| Techno- 
Type scores Size Affuence | logical 
emphasis 
No. faculty 
Realistic -68°* | — .19** .939* 
Investigative .B4** .03 —.04 
Artistic .80** 08 — .38** 
Social .48** 06 — .29** 
Enterprising .T3** 05 — .26"% 
Conventional 52** | — .05 — 239% 
% faculty 
Realistic Bt — .28°* .61** 
Investigative A od .01 EU 
Artistic —.05 .12* | —.57*9 
Social —.10 17** | —.629% 
Enterprising .01 06 — .29** 
Conventional .28** | —.03 — .22** 
Note. n = 186. 
*p < .05. 
** p « 0l. 


total number in each type at each univer: 
sity—rejected (y? = 14,386.92, df = 925) 
the null hypothesis of no variation among 
universities with respect to type distribu- 
tions well beyond the .01 level. Most em- 
phasis is placed on investigative, artistic, 
and social fields and the least on conven- 
tional fields. In other words, Commonwealth 
universities appear to place most emphasis 
on the physical and biological sciences, next 
most on the liberal arts, and next most on 
such fields as teacher education, the social 
sciences, etc. This pattern is similar to the 
one obtained for universities in the United 
States (Richards et al., 1970) and in Japan 
(Richards, 1973). 

Next, correlations were computed be- 
tween the type scores and the measures 0 
size, affluence, and technological emphasis. 
Table 2 summarizes these results. More 
than half of the correlations are significant 
and each environmental measure is COT 
related with several type scores.* These cor- 
relations seem to reflect primarily contrasts 
between large and small universities and bê- 


‘It should be reemphasized that the type na 
sures are ipsative so the significance tests are 
entirely independent. 
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tween universities which do and do not em- 
phasize technological training. The pattern 
of correlations between technological em- 
phasis and the percentage of faculty in the 
various types is similar to the pattern of 
correlations found for universities in the 
United States (Richards et al., 1970) and 
_ in Japan (Richards, 1973). That is, techno- 
- logical emphasis is positively correlated 

with an emphasis on realistic and investiga- 

tive fields and negatively correlated with an 

artistic and social emphasis. The patterns of 

correlations for size and affluence, however, 

show considerable deviation from the pat- 

tern obtained in these earlier studies. In 
- both the United States and Japan, size is 
positively correlated with a relative em- 
phasis on realistic fields and is not signifi- 
cantly correlated with an investigative em- 
phasis. (In all countries, of course, size is 
positively correlated with the number of 
faculty members in each of the types.) In 
the British Commonwealth, size is posi- 
tively correlated with a relative emphasis 
. On investigative fields and is not signifi- 
cantly correlated with a realistic emphasis. 
Similarly, in the United States and Japan, 
affluence is correlated positively with realis- 
and negatively with artistic, while the 
reverse pattern holds in the Common- 


f 
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wealth. These differences presumably can 
be attributed to differing histories of uni- 
versity development in the various nations. 

To check the extent to which national dif- 

~ ferences in these relationships occur within 
the British Commonwealth, the environ- 
mental measures were correlated with the 
type scores separately for universities in 
Britain and in Canada (the only nations 
with enough universities for such computa- 
tions to be meaningful). Table 3 summarizes 
the results. Again, the results suggest rea- 
sonable consistency across nations for the 
correlates of technological emphasis but 
some variation between nations for the 
correlates of size and affluence. If we inter- 
pret these results as population correlations 
for which statistical tests are irrelevant, 
several of the correlations involving afflu- 
ence are of opposite sign for Britain and 
Canada. This pattern certainly suggests 
differences in the fields in which these na- 
tions invest, but it does not indicate whether 
such differences reflect value differences or 
merely reflect historical contingencies. 

The next analysis examined the question 
of whether national characteristics and 
university characteristics are correlated. 
Sawyer (1967) found that much of the 
variation among nations can be described 


TABLE 3 
CORRELATIONS OF Type SCORES WITH SIZE, AFFLUENCE, AND TECHNOLOGICAL EMPHASIS FOR 
Universities IN INDIVIDUAL NATIONS 


Britain (n = 50) Canada (n = 48) 
j Type scores 3 
Size Afience | Technological) Size Affuence | Technological 
No. faculty 
4 Realistic Ware —.24 .20 68°" .05 2 
? Investigative .91** —.03 — .06 94 —.08 1 
L Artistic .19** 15 —.57** .87** —.18 .01 
Social 378" —.01 —.28* .90** —.19 —.10 
7 Enterprising | a | —.11 =.12 T8 = jm 
| Conventional .56** —.98* —.21 .55 - -. 
% facult; i ; 
^" $ Realistic .33* — .38** .45** .20 .18 7 
Investigative .28* —.06 .42** pis .04 PON 
i Artistic —.20 .22 —.00** | —.43** —.13 - Abe" 
; Social —.02 .20 —.56** | —.15 —.21 E 
E Enterprising .01 sas .04 zu .22 L an 
] Conventional .41** — .32* —.22 .22 —.21 .0 
4? < 05. 
*p < 0l. 
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TABLE 4 
CORRELATIONS BETWEEN NATIONAL CHARACTERIS- 
TICS AND AVERAGE CHARACTERISTICS OF 
UNIVERSITIES WITHIN NATIONS 


Characteristics of nations 


Average characteristics of 
universities 


Environmental measures 
Size 
Affluence 
Technological em- 

phasis 

No. faculty 
Realistic 
Investigative 
Artistic 
Social 
Enterprising 
Conventional 

% faculty 
Realistic 
Investigative 
Artistic 
Social 
Enterprising 
Conventional 


by just three relatively independent dimen- 
sions: size, wealth, and political orientation 
(i.e., Communist, neutral or anti-Commu- 
nist). Measures of these three dimensions 
were determined for each of 27 nations (or 
political subdivisions) which included one 
or more of the studied universities. Size was 
measured by the total national population, 
wealth by per capita income (in Canadian 
dollars at 1969 exchange rates), and politi- 
cal orientation, following Sawyer, by a tri- 
chotomous variable with scores of 1 assigned 
to nations having a military alliance with 
the United States, scores of 2 to “neutral” 
nations, and scores of 3 to Communist na- 
tions. In the present study, of course, this 
variable primarily contrasts “neutrality” 
versus military alliance with the United 
States. 

In some cases, a judgmental element was 
involved in deciding how to treat a particu- 
lar university or geographic subdivision. 
Each campus of the University of the West 
Indies (Jamaica, Trinidad, and Barbados) 
again was treated as a separate university, 
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and each island incorporating a campus was 


procedure was followed for the various 
campuses of the University of East Afric: 
(Kenya, Tanzania, and Uganda). On the 
other hand, Lesotho alone was treated as the 
home of the University of Botswana, Le 
sotho, and Swaziland, since the only campus 
is located in Lesotho. 

The characteristics of nations were cor 
related with the average university profiles! 
within nations. Results are shown in Table 
4. It appears from these results that the 
characteristics of universities do inde 
vary systematically with the characteristics: 
of British Commonwealth nations. Large 
nations have large universities that em= 
phasize realistic fields; wealthy nations 
have large, affluent universities that em 
phasize artistic and conventional fields. 
(Perhaps a certain amount of wealth is 
necessary before it is worthwhile educating 
accountants to keep track of it.) Neutra 
countries tend to have small universities 
which emphasize social fields. In most cases, 
this means they emphasize teacher train- 
ing. 

Some Commonwealth universities, most 
notably Cambridge University and Oxford 
University, England, are organized more 
into relatively autonomous colleges than 
into a centralized university structure. This 
organizational pattern is especially suitable 
for exploring the extent to which Holland's 
(1973) typology also is useful for measuring 
subenvironments within universities, so the 
last analysis for this study used the typole 
ogy to examine the environments of Oxford - 
and Cambridge colleges. Specifically, 8— 
count was made of the number of faculty 
members at the various colleges falling 1m% 
Holland’s types. (This count included 84 
“members” of these colleges rather thal 
just “fellows.”) Because only one individuais 
was classified as conventional, that type V? 
excluded from subsequent analyses for t) 
colleges. 

A square root transformation was 
to reduce skewness. The transformed dadi 
for colleges at Oxford and Cambridge 00 
bined were analyzed with the conte 
tional analysis procedure developed by ri 
and Cole (1970). Briefly, the original © 
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44 is 


3.5 


25 


Fiaure 1. Spatial configuration of type and college vectors. Cambridge colleges corre- 
sponding to numbers are as follows: (1) Christ’s, (2) Churchill, (3) Clare, (4) Clare Hall, 
(5) Corpus Christi, (6) Darwin, (7) Downing, (8) Emmanuel, (9) Fitzwilliam, (10) Girton, 
(11) Gonville and Caius, (12) Jesus, (13) King’s, (14) Lucy Cavendish, (15) Magdalene, (16) 
New Hall, (17) Newnham, (18) Pembroke, (19) Peterhouse, (20) Queens’, (21) St. Catherine's, 
(22) St. Edmund's, (23) St. John's, (24) Selwyn, (25) Sidney Sussex, (26) Trinity, (27) 
Trinity Hall, and (28) University. Ozford colleges are as follows: (29) All Souls, (30) Balliol, 
(31) Brasenose, (32) Christ Church, (33) Corpus Christi, (34) Exeter, (35) Hertford, (36) 
Jesus, (37) Keble, (38) Lady Margaret Hall, (39) Linacre, (40) Lincoln, (41) Magdalen, 


(42) Merton, (43) New College, (44) Nuffield, (45) Oriel, (46) Pembroke, (47) The Queen’s 
College, (48) St. Anne's, (49) St. Anthony's, (50) St. Catherine’s, (51) St. Cross, (52) St. 
Edmund Hall, (53) St. Hilda’s, (54) St. Hugh’s, (55) St. John’s, (56) St. Peter's, (57) Somer- 
ville, (58) Trinity, (59) University, (60) Wadham, (61) Wolfson, and (62) Worcester. 


Servation vectors were located in a (re- 
duced) space defined by principal com- 
ponents. This space was then projected onto 
the best fitting plane to provide a “picture” 
of the spatial configuration of the original 


vectors.” The distances between points in 
ate 


"The author is grateful to Nancy Cole for 
carrying out the computations. 


this picture are direct measures of the simi- 
larity between the same points (that is, 
similar points are close together while dis- 
similar points are far apart). 

Figure 1 shows the resulting configura- 
tion of the vectors for Cambridge and Ox- 
ford colleges, as well as the vectors cor- 
responding to the Holland types. The 
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configuration of these type vectors departs 
from expectations (Holland, 1973) in that 
social and enterprising show virtually no 
separation. The configuration of colleges 
reveals a broad discrimination between 
Cambridge and Oxford. (About two thirds 
of the Cambridge colleges are to the right 
of the vertical axis and about two thirds of 
the Oxford colleges are to the left.) This 
provides some support for treating these 
universities as unitary institutions in earlier 
analyses. In general, Oxford colleges appear 
to be more artistic (i.e., to emphasize the 
liberal arts) than Cambridge colleges, while 
Cambridge colleges appear to be more 
realistic (i.e., to emphasize technical educa- 
tion) than Oxford colleges. These results, in 
combination with the underlying theory 
(Holland, 1973), imply that the interper- 
sonal environment of Oxford colleges would 
be described more by such adjectives as 
complicated, emotional, impractical, and 
intuitive, while the interpersonal environ- 
ment of Cambridge colleges would be more 
described by such adjectives as frank, 
stable, masculine, and uninvolved. There is 
considerable interpenetration, however, with 
some Oxford colleges being more typical of 
Cambridge, and vice versa. Oxford colleges 
appear to be considerably more heteroge- 
neous than Cambridge colleges. The most 
distinctive college seems to be Oxford’s 
Nuffield, which appears to emphasize social 
and enterprising fields (e.g., the social sci- 
ences and economics). Similarly, St. Cross 
and Wolfson at Oxford and Clare Hall at 
Cambridge are relatively distinctive, prob- 
ably in an emphasis on science and technol- 
ogy. In view of such differences, an obvious 
question is the extent to which the con- 
figuration shown in Figure 1 corresponds to 
the experience of persons at Oxford and 
Cambridge. 


Discussion 


Overall, the results indicate that in the 
British Commonwealth (a) universities dif- 
fer significantly on the environmental mea- 
sures, (b) different measures of the univer- 
sity environment are meaningfully related 
to each other, and (c) university character- 
istics and characteristics of nations cor- 
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relate fairly meaningfully. Therefore, th 
results suggest that these measures are ap 
propriate and useful for characterizin, 
Commonwealth universities and might no’ 
be used to investigate such problems as th 
relationship between objective and phe 
nomenological university environments o 
the relationship between university environ 
ment and differential impact on students i 
the Commonwealth. In such studies, the 
underlying psychological theory (Holland, 
1973) could provide a fruitful source of 
hypotheses amenable to empirical testing. 
The results are generally consistent with 
earlier studies of U.S. and Japanese uni- 
versities (Richards, 1973; Richards et al, 
1970) but with enough variation to support 
the appropriateness of these techniques for 
studying national differences in university 
environments. Therefore, this study helps 
provide a broader, more international con- 
text for studying university characteristics. 
Because all Commonwealth universities are 
influenced to some degree by a common 
heritage, it would be highly desirable to 
study universities in a larger, more inclusive 
set of nations. Some of the results for 
British Commonwealth universities should 
hold for universities in all nations; that is, 
large nations should generally have large 
universities, and wealthy nations should 
have large, affluent universities. Also, Hol- 
land’s theory (1973) and Maslow’s (1954) 
notion of a hierarchy of needs would lead 
to the expectation that universities 1D 
wealthy nations would place relatively high 
emphasis on artistic fields. Other results 0! 
this study might be more specific to the 
British Commonwealth and, therefore. 
might be expected to change in a broadet 
sample of nations. For example, developing 
nations would be expected to place heavy 
emphasis on development of agriculture 8 
major civil engineering projects, such # 
roads, dams, and the like. In a broadet 
sample of nations, this might produce a SU 
stantial negative correlation between Di 
tional wealth and emphasis on education for 
realistic fields. Similarly, the most obvios 
bias in this set of Commonwealth nations 9 . 
the absence of nations explicitly commit jd 
to the Communist ideology. Such natio 
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might be expected to place more emphasis 
than other nations on engineering related to 
the needs of industry and on the physieal 
sciences. Therefore, national politieal ori- 
entation (using this study's scoring system) 
might be positively correlated with an 
emphasis on education for realistic and in- 
vestigative fields if more Communist na- 
tions were included. 

Finally, universities may not provide a 
single environment but rather multiple sub- 
environments. The results for Oxford and 
Cambridge colleges obtained in this study 
Suggest that these same techniques can be 
useful for describing such subenvironments. 
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UNFAMILIAR STIMULUS TERMS IN CHILDREN'S 
PAIRED-ASSOCIATE LEARNING 


ROBERT E. DAVIDSON! 
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Verbalization and/or imagery processes facilitate paired-associate 
learning. The degree of facilitation seems to depend on how well sen- 
tences or images unitize the stimulus and response terms, Unfamiliar 
or nonsense stimulus terms may not be able to play a role in such 
processes unless they are “concretized.” A total of 64 third-grade pupils 
learned 10 paired associates with nonsense words serving as stimuli 
and familiar nouns serving as responses (eg., latuk-boat). One half 
of the subjects looked at concretizing pictures of the paired items and 
heard sentences or labels only. Sentences without pictured stimuli im- 
paired learning, while sentences with pictures facilitated learning sig- 
nificantly. Perhaps a sentence serves to assure the triggering of a com- 
pound image that unitizes the terms. 


Verbalization and/or imagery processes 
facilitate paired-associate learning, often 
dramatically (Davidson & Adams, 1970). 
That assertion holds under many methodo- 
logical variations, variations that have yet 
to be exhausted. However, despite the large 
numbers of studies that have been generated 
in this area of research (Paivio, 1971), it is 
not elear why the processes work so well. 
Indeed, it is not yet clear whether the ob- 
served facilitation follows from the opera- 
tion of two independent processes (Paivio, 
1971) or from a single process (Rohwer, 
1972). 

The research to date does seem to indi- 
cate that the magnitude of the facilitation 
effect varies as a function of how well the 
paired items are unitized, either by the ex- 
perimenter or by the learner himself. Unit- 
ization under verbalization strategies usu- 
ally takes the form of a simple sentence or 
prepositional phrase that asserts a relation- 
ship between the stimulus and response 
terms (Davidson & Dollinger, 1969; Roh- 
wer, 1966). Unitization under an imagery 


1 Requests for reprints should be sent to Robert 
E. Davidson, Department of Educational Psy- 
chology, School of Education, University of Wis- 
consin, Madison, Wisconsin 53706. 
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strategy either encourages in the learner 
displays for the learner spatial/dynami 
motorie relationships (Bower & Winzel 
1970; Davidson & Adams, 1970; Rohw 
1967; Wolff & Levin, 1972). Most of the 
search on verbalization and imagery pr 
esses has used as stimulus materials rel 
words (usually nouns) that may vary on 
abstract-concrete dimension. Intraverb 
processes are said to be important for t 
learning of abstract nouns, while image 
processes are preferred in the presence 
concrete nouns (Paivio, 1971). In particular, 
if the stimulus term is a familiar cone! 
noun, it ean act as a "conceptual peg 
which the response term can be hooked. 
conceptual-peg analysis extends to nonwoi 
consonant-vowel-consonant combinations? 
the experimental operations employ 
heighten the “concreteness” of stimi 
(Philipehalk & Begg, 1971). E i 
One aspect of the present study exami 
the role of “heightened” concreteness of k 
familiar stimulus terms. One half of Wi 
learners saw pictures of specially crea 
objects that served as referents for oi. 
ficial dissyllable “words” (Noble, 1 
Certainly there is nothing in cognitive 
ories of imagery to suggest that only 8em 
tically familiar objects can function 
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conceptual pegs. Therefore, pictorial rep- 
resentation of unfamiliar items was ex- 
pected to boost performance considerably. 

Another aspect of the study examines the 
role of verbalization for unfamiliar stimuli. 
Here, the prediction would seem to be 
equally clear. Verbalized “sentences” that 
depart from normal structures, either seman- 
tie or syntactic, do not facilitate learning 
(Rohwer, 1966; Rohwer & Levin, 1968). On 
the other hand, an artificial word serving as 
a stimulus term may be analogous to an 
abstract word, and the sentence or intra- 
verbal assertion (ie. the verb) might serve 
to attribute meaning to the unfamiliar word. 
Tn that case, sentences (as opposed to labels 
only) should facilitate learning. 


METHOD 


Subjects and Design 


Sixty-four third-grade children, 32 boys and 
82 girls, were recruited from one semirural public 
school. In addition, 50 third- and fourth-grade 
pupils were used to norm the unfamiliar, pictured 
stimuli. The 64 experimental subjects served in a 
2 X 2 X 2X 5 (Sex X Picture versus No Pic- 
ture X Label versus Sentence X Trials) repeated 
measures design. The first three were between- 
Subjects factors, and the fourth was a within-sub- 
Jects comparison. 


Materials 


Ten dissyllable nonsense words (Noble, 1952) 
Serving as stimulus terms were paired with 10. 
common object, nouns. The pairs were as follows: 
perkil-book, bydoe-chair, latuk-boat, arkot-TV, 
rodig-tree, zumap-car, gokem-shoe, feemit-star, 
neglan-bed, and ookus-house. 

For the verbal elaboration conditions, the pairs 
Were embedded in simple declarative sentences 
With stimulus and response terms functioning as 
Subject and direct object, respectively. Transitive 
verbs from a list of 1,000 common words were 
assigned at random to the paired words (e.g., “The 
perkil builds the book”; “The feemit keeps the 
Star”; ete.) 

For the pictorial conditions, line drawings were 
Produced for both the stimulus and response 
terms, In the case of the stimulus terms, pictures 
of imaginary objects were created and paired at 
random with the nonsense labels. Pictures of the 
Paired items were displayed side by side on 21 X 

centimeter manila cards. The unfamiliar stimu- 
lus picture was always to the subject’s left. 

In order to assess the degree of unfamiliarity 
of the pictured imaginary objects, reproductions 
of the stimulus cards which had response lines 
added were given to pupils in third-fourth-grade 
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combination classes. The subjects were instructed 
to produce associations to the unfamiliar stimulus 
picture (ie. “Write down the words that come 
to your mind when you see it. What words does 
it make you think of?"). The subjects were told 
to “ignore” the familiar response picture which was 
also displayed. Asking the subjects to produce 
responses when both pictured objects were pres- 
ent was thought to approximate more closely the 
conditions under which the experimental pictorial 
groups learned the materials. The norming sub- 
jects were given 30 seconds per picture to write 
their responses. Two associative meaningfulness 
scores were calculated: Noble’s (1952) m, the 
number of free associates in a unit of time, and 
Howe’s (1972) N, the number of different free 
associates. The range of m values was 1.36-2.44 
(M = 207); the range of N was 12-42 (M = 
242). By way of example, for N the object labeled 
“latuk” produced 15 different associates with 
“rope” as a primary. 


Procedure 


The subjects participated individually. In- 
structions and practice with two example pairs 
were given to each subject prior to the experiment. 
The examples used only familiar object names 
(and pietures for the appropriate groups). The 
instructions pointed out specifically that the 
actual learning pairs would include names (or 
names and pictures) of “things” that the subject 
had never heard of (seen) before. Subjects were 
given successive study-test trials to a criterion of 
9/10, but a minimum of 5 and a maximum of 10 
trials were administered. Order of presentation of 
pairs was varied randomly over trials. In all con- 
ditions, the experimenter orally presented the 
labels or sentences, showing the pictures when 
appropriate for those conditions, and the subject 
repeated aloud what the experimenter said. The 
rate of presentation was 5 seconds for both the 
study and test portions of each trial, with no 
intratrial interval. At the end of the test portion 
of each trial, the subject was told how many 
correct responses he had made during that trial 
but not which items were correct, 


REsuLTS 


Numbers correct at each trial and per- 
centages of response intrusions were the 
dependent measures. For items correct, sex 
was not a significant main effect (F < 1). 
The main effect for type of verbalization 
(label or sentence) was not significant 
(F = 2.71, df = 1/56). The pictorial factor 
(picture or no picture) was significant as a 
main effect (F = 147.51, df = 1/56, p < 
.001), and it interacted with trials (F = 
15.30, df = 4/224, p < .001). Providing 
pictorial representations of the unfamiliar 
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Figure 1. Acquisition curves for the four ex- 
perimental conditions. 


stimulus terms produced greater facilita- 
tion, and the facilitative effect increased dif- 
ferentially over trials. 

The pictorial and verbalization factors 
interacted significantly (F = 15.47, df = 
1/56, p < .001). The nature of the inter- 
action may be inferred by an inspection of 
Figure 1; that is, sentence elaboration of 
pictured items produced the best perform- 
ance in acquisition, while sentences in the 
no-picture condition produced the poorest 
performance. 

Percentages of response intrusions, based 
on opportunities (total presentations minus 
correct responses), were analyzed following 
arc sine transformation. Only the main 
effect for type of verbalization was signifi- 
cant (F = 5.73, df = 1/56, p « .05). Per- 
centage of response intrusions for labels and 
sentences was 51.5 and 40.1, respectively. 
Sentences act to reduce significantly the 
number of overt errors, A similar finding 
was reported by Davidson and Adams 
(1970). 

It was of interest to analyze the relation- 
ship between learning and the associative 
meaningfulness measures m and N, espe- 
cially in view of Howe’s (1972) assertion 
that the use of N permits prediction of 
learning according to the function: Learn- 
ing = J(1/N). Spearman rank correlations 
were calculated between stimulus item rank 
on each meaningfulness measure and item 
rank in acquisition, that is, the number of 
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times the item was given correctly in { 
first three trials of learning (picture gi 
only). Neither correlation was signifi 
[re(m) = .24andr,(1/N) = —.51]. 


Discussion 


Certainly the most interesting effi 
found in this study is the significant in 
action between the pictorial and verbalii 
tion conditions. The “worst” condition 
learning is sentences alone, without pictur 
but the “best” condition is sentences 
pictures. It could be hypothesized that 
der a sentence-no-picture condition, 
subject attempts to generate his own 
tional devices. Not only are sentence 
vided by the experimenter not helpi 
the subject, but they might also a 
interfere with his attempt to form hisi 
mediational devices. Indeed, the seni 
may interfere with an out-and-ou 
process. However, when a sentence 18} 
vided in the context of a picture (sentem 
picture), the sentence may act to um 
the coneretized terms by way of a € 
pound image. Unitization of the piet 
pairs may occur without sentences, bub 
haps sentences serve to assure the trig 
of the compound image that faciliti 
learning. 

Sentences were also shown to reduce 0 
errors, but why sentences act to reduce 
responding is not clear. It may be the 
that a conceptual-peg imagery ene 
generates overconfidence in the 8 
which allows him to respond in the "$ 
knowledge that he is correct when m 
he is not. However, imagery with cone 
tant sentence encoding, specifically 
transitive verb, may act to limit the) 
ject’s responses to a smaller or more 
criminable pool of items. That is, the 
might allow the subject to perform an* 
ing function on an incipient response, W 
assures him that the response is incon 
and he omits it. A verb constraint exp" 
tion is not entirely satisfying, howeve 
view of the results reported by Rol 
(1967) wherein a constraint hypothe 
essentially ruled out as an explanat 
the sentence facilitation effect in P" 
associate learning. 
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Students in 10 first-year courses were asked to rate their instructors. 
The mean ratings for each class were correlated with the mean class 
mark on final, common, board-marked examinations. The mean cor- 
relation was +.39, and correlations were both positive and higher than 
4-.32 in all but 2 of the courses. Correlations were higher for experi- 
enced full-time faculty members than for inexperienced faculty mem- 
bers and lowest for inexperienced part-time instructors. Academically 
successful and highly evaluated instructors were both “task oriented" 
and interest arousing. Unsuccessful but highly evaluated instructors 
attempted to arouse interest without being task oriented. However, 
electing to take subsequent courses in the subject and the level of 
achievement in these courses was more highly related to sub- 
ject’s level of achievement in the first course than to evaluation of 


instructor. 


Although considerable data have been 
collected concerning the reliability of 
student ratings of instructors, relatively 
little information is available concerning 
the validity of student evaluations when the 
criterion is the amount which students 
learn from a course. Most studies of validity 
have used correlations with peer ratings or 
supervisor ratings as the criterion. 

When final grade is used as the measure 
of achievement, approximately half of the 
studies reported show positive correlations 
between grades and student evaluations 
while the remaining half show no cor- 
relation or negative correlations between 
the two (Costin, Greenough, & Menges, 
1971). 

When a more objective measure of final 
achievement has been used as the criterion 
of achievement, for example, score on an 
achievement test, positive but modest cor- 
relations have usually been reported (El- 
liott, 1950; Morsh, Burgess, & Smith, 1956). 


1 The research reported in this paper was sup- 
ported by Canada Council Research Grant S71- 
1869. 

? Requests for reprints should be sent to Arthur 
M. Sullivan, Department of Junior Studies, Me- 
morial University of Newfoundland, St. John's, 
Newfoundland, Canada. 


Using a university mathematics cou 
taught by teaching assistants, Rodin an 
Rodin (1972) found, however, a correlation: 
of —.75 between a rating of overall per- 
formance and final grade, as determined by 
the number of criterion problems passed; 
and with the effect of prior achievement re- 
moved. They concluded that students are 
not able to judge teaching effectiveness if 
the latter is measured by how much they 
have learned. This conclusion has been 
challenged by Gessner (1973) who found, 
in the case of sophomore medical students 
taking a one-semester science course, cl 
relations of .74 and .62, respectively, be- 
tween student rating of content and organ- 
zation and of presentation and class 
performance on national normative exami 
nations. However, no significant correlation 
was found between student ratings and pe | 
formance on institutional examination 
The Rodin and Rodin study has also b 
challenged by Frey (1973), who found] 
positive correlation between instructor 
rating and students’ examination perforti 
ance of .958 and 951 for two courses M 
mathematics. 

In addition to the contradictory n8 


ture of 
ige 
these reports, it is difficult to general 
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from the studies which have been reported 
because in most cases only one course was 
investigated and in many eases the instruc- 
tors were teaching assistants rather than 
full-time experienced instructors. Also, the 
measures of achievement used were often 
not those commonly used or readily under- 
stood. 

At Memorial University of Newfound- 
land, Canada, we have been able to take 
advantage of circumstances which enabled 
us to carry out a study involving a large 
number of instructors, many of whom were 
full-time experienced faculty members, and 
an objective and very common measure of 
achievement. 

This study was designed to find out if, 
under these circumstances, students were 
able to identify those instructors from 
whom they learned most. 

A second consideration was to find 
out the qualities which differentiated the 
successful and highly evaluated instructors 
from those who were not successful or 
highly evaluated. 


METHOD 


Setting and Materials 


The Junior Division of Memorial University is 

an administrative unit within the university which 
ls responsible for the planning, presentation, and 
evaluaton of first-year university and special pre- 
paratory (foundation) courses. Foundation courses 
are offered in those subjects (English, mathemat- 
ics, physics, chemistry, biology) in which it is im- 
portant that the student have a good background 
before beginning a university level course but in 
Which, because of variations in high school facili- 
ties and instruction, some incoming students have 
either no prior preparation or an inadequate one. 
,. Successful teaching experience is weighed heav- 
ily in making appointments to the division and, 
therefore, most faculty members have considerable 
experience in teaching in the high school system. 
Some faculty members have extensive experience 
elsewhere but have little experience in our high 
School system, and a few faculty members have 
no previous teaching experience. In one subject 
(psychology), a number of graduate students are 
employed as part-time teaching assistants. 

Although the first-year intake is in excess of 
2,000 students, a sufficiently large number of fac- 
ulty members have been hired to permit small 
classes, Students are usually taught in classes or 
Sections with between 30 and 40 members. All 
Subjects have several sections and those with the 
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largest enrollment (English, psychology) have as 
many as 45 separate sections and 20 different in- 
structors. 

Because of the large number of sections and be- 
cause of the importance of maintaining standards 
at the first-year level, many first-year subjects 
have examination committees. These committees 
usually include some Junior Division faculty mem- 
bers and some Senior Division faculty members 
who are involved in the teaching of the second- 
year courses. Early in each semester, the examina- 
tion committee makes up and circulates a model 
of the final examination, but the final examination 
itself is not circulated and its specific content is 
not known to any except the committee members 
until the examination is written. The committee 
also sets out guidelines to be used on marking each 
answer. All answers are board-marked with a small 
group of faculty members marking one question 
on all papers. The marks on the final examination 
are then tabulated simply by adding the raw scores 
on each question. The mark on the final examina- 
tion counts 50% of the final grade. The remaining 
50% is determined by term work and is assigned 
by the instructor. The mark received on the final 
examination for those subjects in which board- 
marking is used is, therefore, probably as objective 
a measure of student achievement as it is possible 
to get in an applied learning situation. 

During the first semester of the 1971-1972 aca- 
demic year, a student evaluation form was pre- 
pared. The form contained an evaluation section 
which included the following questions: 


1. As an overall rating, would you say the 
instructor is i 


Excellent Poor 


2. Specifically, how would you rate his (her): 


(a) interest in students f 


(b) ability to present material in a clear and 
easily understood manner 
5 4 3 2 1 
LE 
(c) ability to get students really interested 
in the subject 
5 4 i 1 1 


The other section contained specific statements 
which described the performance of the instructor 
and were intended primarily to provide feedback 
to him? d 

Therefore, it was possible to obtain from each 
student a relatively objective measure of achieve- 
ment, that is, the final examination mark; and 
since the rating had been done anonymously, we 


3 Copies of the evaluation form may be obtained 
from the senior author. 
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TABLE 1 
CORRELATION BETWEEN INSTRUCTOR RATING AND 
FiNAL EXAMINATION ACHIEVEMENT 
(DecemBer 1971) 


Subject. No. sections r 
Biology 100F 6 — 2767 
Biology 1010 14 .4157 
Chemistry 100F 8 .0785 
Chemistry 1000 6 .5535 
Mathematics 100F 8 .ATI2 
Mathematies 1010 16 .3379 
Mathematics 1150 9 .3270 
Physies 1050 9 5720 
Psychology 1000 40 .4009' 
Science 115A 14 .5106* 
Total T .394** 
* p < 05. 
**p« Ol. 


were able to find the mean rating for each section 
on the item which measured overall competence, 
the three items which measured specific character- 
istics, and the mean proportion of students who 
answered yes to each of the descriptive items. 


Procedure 


In September of 1971, approximately 2,300 first- 
year students registered and selected their courses. 
In each course, students were assigned randomly 
to one of the available sections. During the tenth 
week of the semester, the course evaluation form 
was completed anonymously. At the end of the 13- 
week semester, all students wrote a final examina- 
tion and received a grade for the course. 

The following subjects used a final, common 
examination which was board-marked and which 
counted 50% toward the final grade: Biology 100F, 
Chemistry 100F, Mathematics 100F (special pre- 
paratory courses for those students whose Grade 
11 mark in the subject was less than 70% or who 
had not taken the subject in high school); Biology 
1010, Chemistry 1000, Mathematics 1010, Physics 
1050, Psychology 1000 (regular first-year courses) ; 
and Mathematics 1050, Science 115A (special first- 
year courses for those intending to become teach- 
ers of primary and elementary grades). 

The final examination mark of each student was 
tabulated. For each section, the mean mark on 
this measure and the mean rating on Question 1 
(overall competence) of the evaluation were cal- 
culated. Correlations were then obtained between 
mean instructor rating and mean final examina- 
tion marks. 

Since students had been assigned to them on a 
random basis, it was reasonable to assume that 
sections were similar in initial achievement, and 
in fact, remarkably little variability among sec- 
tions was found in average high school marks in 
those subjects for which these marks were readily 
available. 
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RESULTS AND DISCUSSION 


The correlations between student ratings 
and final examination marks are given in 
Table 1. 

Of the 10 correlations, it can readily be 
seen that 9 are positive and 8 are above 
+.32. The average correlation is +.39. A 
test for significance of a number of inde- 
pendent tests of the same hypothesis 
(Winer, 1971, p. 49) revealed a significant 
overall positive effect (x? = 44.13, df = 20, 
p < .05). In general, then, these results indi- 
eate that there is a modest, but significant, 
relationship between student evaluation 
of instruction and student achievement. 

The relatively small magnitude of the 
correlation coefficient in this study may be 
partially accounted for by the restricted 
range for both variables. Faculty members 
were, in general, evaluated highly, with lit- 
tle variability. The mean rating was 3.7 and 
the standard deviation of the ratings was 
41. Similarly, the range of achievement 
among the sections was narrow. For most of 
the courses, then, the instructors formed a 
homogeneous group, yet although the over- 
all significant positive correlation was low, 
it was a clear indication of the ability of 
students to identify those teachers from 
whom they learn most. 1 

In subjects where the population of in- 
structors was more heterogeneous (eg; 


-Science 115, in that three of the eight in- ^ 


structors were teaching in our university for 
the first time), the correlation was rela- 
tively high, indicating that in this situation 
students find it easier to make a valid as- 
sessment. Further support for this interpre 
tation and additional information concern- 
ing the factors involved in the validity of 
student evaluations may be found in 89 
examination of the results in Psychology 
1000. This subject has the largest number 
of sections and the greatest heterogeneity 
among instructors. Of the 40 sections Im 
cluded in this report, 27 were taught by 
full-time instructors and the remaining 

by graduate students who were hired 8$ 
part-time teaching assistants. The cours? 
consists of 10 modules which are taught a 
a highly structured manner with cien 
specified objectives and frequent eval" 
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ations (Sullivan, 1969), but each instructor 
has considerable responsibility for the 
specific instructional material which is used 
and the method of presentation. 

The relationship between evaluation and 
student achievement was tabulated, and 
correlation coefficients were calculated for 
the entire group and separately for the sub- 
groups of full-time instructors and graduate 
students. For the group of full-time instruc- 
tors, the correlation is +.528, which is 
significant at the .01 level. For the graduate 
students, the correlation is +.007 which 
does not differ significantly from zero. The 
overall correlation between rating and final 
examination mark for all 40 sections is 
+.401, which is significant at the .01 level. 

The difference between the validity of 
ratings for part-time and full-time instruc- 
tors is important and interesting. The two 
factors which appear to be of most im- 
portance in explaining the difference are the 
nature of the commitment to teaching and 
the amount of experience. 

Full-time instructors are heavily com- 
mitted to teaching and must accept major 
responsibility for the outcome of their in- 
struction. A high level of student achieve- 
ment is, therefore, a primary goal for them, 
and since they have considerable autonomy 
in the actual process of instruction, they 
are able to accept a high degree of re- 
sponsibility for the attainment of this goal. 
Part-time teaching assistants and graduate 
students are not heavily committed to 
teaching and do not usually have much 
autonomy in planning the process of in- 
struction or in carrying out their specific 
responsibilities. Student achievement may 
not be a primary goal for them. They may 
be more interested in alternative goals, such 
as getting along well with the students they 
teach or arousing interest in stimulating 
topics which are not necessarily relevant to 
the major objectives of the course. Student 
evaluation, therefore, might be based on the 
success that they have in attaining these 
other goals rather than on achievement. 

This point has also been noted by Gessner 
(1973), who commented on the minimal 
level of responsibility for student achieve- 
ment which characterized the teaching as- 
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sistants in the Rodin and Rodin (1972) 
study. 

The other factor, amount of experience, is 
also of considerable importance. When we 
divided our full-time instructors in psy- 
chology into those who were in their first 
year of full-time teaching (inexperienced) 
and those with one or more years of full- 
time teaching (experienced), we found that 
the correlation between evaluation and 
achievement was significant (p < .01) for 
the experienced (r = .685) but not the in- 
experienced (r = .132) instructors. This 
finding suggests that valid ratings are easier 
to obtain in the case of experienced instruc- 
tors who presumably have developed a more 
consistent style of teaching than inexperi- 
enced instructors. 

These data may help to provide an ex- 
planation of the contradictory results which 
have been reported in previous studies. The 
Rodin and Rodin (1972) results are based 
on part-time teaching assistants, while those 
of Gessner (1973) and Frey (1973) are 
based on full-time experienced instructors. 
It is likely that most of the studies which 
have reported no correlation or a negative 
correlation between student evaluation and 
achievement have used results from part- 
time or inexperienced instructors, while 
those which have reported positive cor- 
relations have gathered data from full-time 
and experienced faculty members. 

On the basis of the results of this and 
other studies referred to above, it appears 
that under certain circumstances students 
are able to provide an accurate estimate of 
the amount they learn from an instructor 
and that student evaluations may provide a 
valid indication of the amount which the 
students have learned. Valid ratings are 
much more common and are easily obtained 
in the case of experienced and full-time in- 
structors than in the case of inexperienced 
or part-time instructors. 

In an attempt to find the characteristics 
associated with effective and highly evalu- 
ated teaching, the instructors in those sub- 
jects which have the largest number of 
sections (biology, mathematics, and psy- 
chology) were divided into four groups at 
the medians of the achievement and evalu- 
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ation dimensions. Of the resulting four 
groups, two groups, the high-achievement- 
high-evaluation group and the low-achieve- 
ment-low-evaluation group may be de- 
scribed as “consonant.” The other two, the 
high-achievement-low-evaluation group and 
the low-achievement-high-evaluation group 
may be described as “dissonant.” 

An examination of the mean scores for 
the instruetors in each quadrant on the 
specific evaluation questions (i.e. interest 
in students, clarity of presentation, and 
ability to arouse interest) and the percent- 
age of students answering yes to each of the 
descriptive items revealed the following: 

l. Positive answers to items associated 
with attitude toward students, for example, 
"is friendly and understanding" and “is 
usually available for help," had surprisingly 
little relation to either favorable evaluation 
or a high level of student achievement. 

2. Positive answers to items associated 
with clarity of presentation, for example, “is 
well prepared,” uses the blackboard effec- 
tively,” and “explains things clearly,” were 
related to a favorable evaluation but were 
not necessarily associated with a high level 
of student achievement. 

3. Positive answers to items associated 
with “task orientation,” for example, “ex- 
pects too much from students” and “lectures 
present too much material,” were not re- 
lated to a favorable evaluation but were as- 
sociated with a high level of student 
achievement. 

Incidentally, the items associated with a 
high level of achievement differed somewhat 
from subject to subject. In psychology and 
biology, success was associated with direct 
and continuing pressure for the student to 
work hard, whereas in mathematies, success 
tended to be associated with a supportive 
orientation which helped to prevent the 
students from becoming discouraged. 

When a comparison is made for all sub- 
jects between the “consonant” instructor 
groups (i.e., those who are high on achieve- 

ment and evaluation versus those who are 
low in both), it is apparent that successful 
and highly evaluated instructors are able to 
combine “task orientation” with the quali- 
ties associated with high evaluation. Thus, 
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successful instructors are able to combin 
task orientation with the following: clarit 
of presentation, the encouragement of in- 
dependent thinking, the expression of dif 
ferent points of view, and the ability to con 
vey enthusiasm for the subject. Unsucce 
ful and poorly evaluated instructors on the 
other hand are obviously not able to present 
the material clearly, to arouse enthusiasm 
for the subject, or to induce the students to 
work hard. 
When a comparison is made between the 
“dissonant” instructor groups, that is, those 
high on one dimension but not on the othe 
interesting findings emerge. Some examples 
of these are as follows. 
Positive answers to questions such as 
“conveys enthusiasm for his subject" and 
“encourages the expression of different. 
points of view” tend to be associated with a 
low rather than a high level of achievement. 
These “dissonant” instructors, then, were 
not able to combine arousing enthusiasm 
for the subject with task orientation. Some, 
the high-achievement-low-evaluation group; 
placed most emphasis on student achieve- | 
ment while others, the low-achievement- 
high-evaluation group, placed greater em-- 
phasis on arousing interest and enthusiasm: 
Students may at times, often in the case | 
of a “dissonant” instructor, have an UD 
favorable reaction to a successful an 
achievement-oriented instructor. He may be 
seen as a hard and demanding task master 
who discourages questions and comments 
about interesting but irrelevant material iD - 
his determination to ensure that all of the 
required material of the course is covel 
and that high standards are maintained. 
On the other hand, the highly evalua 
but academically less successful instructor 
obviously impresses the students by the? 
forts which he makes to arouse interest 8? 
enthusiasm and to generate discussion, bu 
he does so at the expense of the standard i, 
work which he requires of the students. 
though it is difficult to relate these finde 
to those obtained from other studies, 87 
such data are rarely reported, these resu, 
are similar to and entirely consistent v7): 
those reported by Peck and Veldman (19 is 
Many will argue that it is of ™ 


VALIDITY OF STUDENT EVALUATION 


fundamenta] importance for the students to 
acquire an interest in the subject in a first 
course than for them to acquire information 
or knowledge. The argument states that 
students who acquire information but not 
interest are "turned off" from the subject 
while those who acquire an interest are 
"turned on" and, therefore, elect to do 
further courses in the subject and to attain 
a higher level of achievement in these sub- 
sequent courses. 

It is, of course, difficult to provide evi- 
dence which is crucial to this argument, but 
we were able to gather some relevant data 
by examining the second-year academic 
performance of those students who had com- 
pleted and passed Psychology 1000 in 1971— 
1972. Table 2 gives the percentage of 
students taught by each of the four groups 
of instructors who elected to take a second- 
year course in psychology and the average 
mark obtained in these second-year psy- 
chology courses. 

_The proportion of students taught by a 
high-achievement instructor who elect to 
take further courses in psychology com- 
pared with that of those taught by a highly 
evaluated instructor is higher although not 
significantly so (F = 4.10, df = 1/18, .05 < 
P « .10). The mean marks in subsequent 
psychology courses are more closely related 
to previous achievement than to instructor 
evaluation (F = 9.04, df = 1/99, p < .01). 

It is interesting to note that the one group 
Which demonstrates both the highest per- 
centage of students continuing and the high- 
est, marks in second-year courses is that of 
Students taught by high-achievement-low- 
evaluation instructors. 

When the specifie courses for which 
Students registered in the second year were 
examined, it was found that a smaller per- 
centage of students taught by high-achieve- 
ment-low-evaluation instructors tended to 
register for second-year major courses in 
psychology than of students taught by low- 
achievement-high-evaluation instructors 
(5.3% versus 8.4%). However, even in these 
major courses, the marks obtained by the 
students of high-achievement—low-evalu- 
ation instructors were higher (61.0% versus 
56.5%). Also, a higher proportion of 
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TABLE 2 


PERFORMANCE OF STUDENTS IN SECOND-YHAR 
PsvcuoLoav Courses AssociaTED WITH 
Various Types oF INSTRUCTOR 
CHARACTERISTICS 


Academic performance in 


Characteristic i 
tics of instructors | second-year psychology courses 


Achieve- 


M ME ra students M mark, 
high | high 8 40.7 61.5 
high low 4 43.3 66.1 
low high 5 30.8 67.9 
low low 7 26.1 55.8 


students taught by low-achievement-high- 
evaluation instruetors dropped psychology 
major courses after registration. 

It seems, then, that in introductory psy- 
chology, at any rate, if an instructor con- 
centrates on producing a high level of 
achievement, his students are at least as 
likely to take subsequent courses and are 
more likely to do well in those courses than 
they would have if his emphasis had been 
on arousing interest and enthusiasm in the 
subject. On the other hand, the instructor 
who concentrates on arousing interest in 
the subject without at the same time tak- 
ing steps to ensure a high level of achieve- 
ment may be doing his students a disservice 
in that they may elect to major in the sub- 
ject, but lacking the necessary background, 
they may do poorly in or fail the required 
courses at the second-year level. 

This finding may be of considerable im- 
portance since most beginning instructors 
tend, and are in fact encouraged, to spend 
a great deal of time finding ways of making 
the subject interesting but do not spend 
much time in finding ways to encourage the 
students to work hard and are apprehensive 
about insisting on high standards of per- 
formance. 

Our results certainly suggest that because 
students are likely to be initially interested 
in a subject such as psychology and because 
they have had no previous experiences with 
the subject which were discouraging, the 
instructor would be well advised to concen- 
trate on student achievement and not to be 
overly concerned with arousing interest in 
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the subject. It is of crucial importance, how- 
ever, that this result should not be over gen- 
eralized since in a subject for which initial 
interest is not likely to be high and in which 
the student may have had previous experi- 
ences that, were discouraging, for example, 
mathematies, the interest-arousing role of 
the instruetor may be of much more impor- 
tance. 
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MULTIDIMENSIONAL SCALING OF CONCEPT LEARNING 
IN AN INTRODUCTORY COURSE: 


HOWARD WAINER? ano KENNETH KAYE 
The University of Chicago 


Using a model for individual differences in multidimensi i 

(INDSCAL), 16 concepts dealing with developmental pened 
were scaled before and after a course. It was found that 3 dimensions 
emerged, but that 1 of them was relatively unimportant for the 
course, The postadministration indicated that this dimension was per- 
ceived as having lesser importance. The criterion measure used was 
the instructor's position in the subject space; movement toward him 
by the students was significant from pre- to postcourse administration. 
Implications of this methodology for educational evaluation are dis- 


cussed. 


_ A major goal of any course of instruction 
is the integration of concepts into a cohesive 
structure. The recall of facts and the ability 
to define concepts are fairly easy outcomes 
to assess, but the extent to which students 
understand interrelationships among the 
facts and concepts is problematic. Relation- 
ships are more difficult to define; there 
is far less agreement among instructors 
and among authors as to the meaningful 
structure of the subject matter; and the 
Instructor is usually ambivalent about 
whether his students should be acquiring the 
Structure, his structure, or their own struc- 
ture. One can make inferences about the 
body of knowledge sampled by factual or 
definition questions (e.g., Lord & Novick, 
1968), but instructors desiring a more 
qualitative assessment ordinarily use essay 
examinations. The experience of preparing 
for and writing essays may be valuable to 
the students, but as measures of instruc- 
tional effectiveness they are nonobjective, 
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difficult to score, and relatively unreliable. 
(Nunnally, 1970, p. 200) 

Wainer and Berg (1972) used multi- 
dimensional scaling (MDS) to determine, 
objectively and reliably, college students’ 
perceived structure of interrelationships 
among nine short stories by De Maupassant. 
This study used a matrix of similarities ob- 
tained from a group of advanced French 
majors, employing the Shepard (1962) and 
Kruskal (1964) method of multidimensional 
scaling. A second study using the same pro- 
cedure (Berg & Wainer, 1973) asked two 
further questions. Students judged the 
similarity in pairs of poems by Baudelaire, 
before and after several weeks of lectures 
and discussions. In addition, the perceived 
structure of students who read the poetry in 
the original French was compared with that 
of students who read the same poems in 
translation. The results of this study were 
mixed. Readers of the translations ap- 
parently perceived only one dimension deal- 
ing with ideas and imagery, while readers 
of the original French poems perceived a 
second dimension involving the rhetoric and 
sound patterns, With only nine poems as 
stimulus items, multidimensional scaling 
would be unlikely to establish more than 
two dimensions reliably. The comparison 
of structures before and after instruction 
also yielded only gross effects, and the ob- 
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servation of changes was of an exploratory, 
speculative nature: confirmation had to 
await a more delicate methodology. 

The present study is an improvement in 
several respects: (a) the intervening treat- 
ment was a full course of study, including 
films, discussions, demonstrations, lectures, 
reading, and examinations; (b) stimulus 
items were single concepts, making it possi- 
ble to include enough (16 items) so that 
several stable dimensions could emerge 
(Klahr, 1969) ; (c) the concepts were drawn 
from a longer list which formed the explicit 
curriculum from the students’ point of view: 
their grades depended upon examinations 
consisting solely of pairs of items ("define 
and relate”) drawn from the same list of 80 
concepts; (d) it was possible to determine 
the position and movement of individual 
subjects in the dimensional space of the 
groups as a whole; and (e) it was possible 
to determine the extent to which the instruc- 
tor himself perceived the same dimensions 
as those of the students. 

The latter two improvements resulted 
from the use of a powerful tool: Carroll and 
Chang’s (1970) model for individual dif- 
ferences in multidimensional sealing (IND- 
SCAL), a special case of Tucker’s three- 
mode factor analysis model (Tucker, 1972). 
Their model yields essentially the same 
spatial configuration of items as the Shep- 
ard-Kruskal method, but in addition, it 
uses each individual subject’s judgments to 
weight, for him alone, the importance of 
each of the obtained dimensions. This is 
not the same as obtaining each subject’s 
personal dimensions, but in some respects 
it is better. It allows us to plot individual 
subjects in the same space as the items, 
where the coordinate of a particular sub- 
ject on a particular dimension is the weight 
he attaches to that dimension. Moreover, it 
calculates the proportion of each individ- 
ual’s variance accounted for by the struc- 
ture obtained from the group as a whole. 


METHOD 


Subjects were 45 undergraduates drawn ran- 
domly from 90 who completed the introductory 
course in developmental psychology at the Univer- 
sity of Washington in the summer quarter, 1972. 
The text (Johnson & Medinnus, 1969), films, and 
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live demonstrations were supplementary in th 
instructors mind, to the 20 lectures and th 
weekly discussion sections taught by a gradual 
assistant. Both the lectures and the discussion 
focused upon key concepts presented to th 
students in the syllabus at the first meeting, wit 
the understanding that each of their four half 
hour examinations would consist of five paim 
drawn from this list. This method had been 
by the instructor in three previous undergradi 
courses at the University of Washington; 1 
advantages are that students know what to exp 
without knowing the exact questions (they are 
told to expect pairs like "assimilation versu 
accommodation" rather than distant ones like "sex 
role versus formal operations"), their respon 
are short and pertinent to what the instructor 
hopes to convey, and they can be graded relial 
Our multidimensional-scaling procedure 
volved only 16 of the concepts, chosen so as to 
vary along 3 dimensions which the instructor 
thought would characterize the structure of his 
course; (a) infancy-childhood-adolescence (phe: 
nomena), (b) nature-interaction-nurture (proe 
ess), and (c) mental-libidinal-operant (theory) 
The 16 concepts together with the definitions. 
handed to the subjects are as follows: 


socialization—process by which the child learns to 
behave in accord with the expectations of 
family, peers, and society ; 

formal operations—mental manipulations of ob- 
jects and ideas, which together give adult thought 
its structural, logical properties ; 
basic trust—the first stage toward the develop- 
ment of a mature identity, in which the infant 
relies on the consistent care he receives from! 
adults; 


attention—focusing one's perception on & given 
stimulus or set of stimuli while ignoring others; 


operant—a class of responses whose frequency of | 
occurrence is determined by the way the 
have been rewarded or punished in the past; 


reflez—unlearned, involuntary, specific response 1 
stimulation of a given kind ; 


intention—motivation determined by the end y 
be achieved, which maintains an organism?» 

havior even in the absence of any phys 4 
stimulus or previous reward ; 


schema—the regularity underlying a class ee 1 
sponses, capable of being organized hierarchi 
toward the attainment of a goal or end state; 


conservation—a system of mental operation 
enabling older children to predict correct? 
the amount of some substance will rema 
changed when it is poured or molded oF 


out into a different shape ; n 
n t 

regression—behavior that was appropriate child- 

earlier stage, but has been revived in ist 
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hood or adulthood when it is regarded as patho- 
logical ; 

alienation—a feeling, in adolescents and adults, 
that they do not belong or are not appreciated in 
their families, peer groups, or society in general; 
innate component—behavior which develops much 
more quickly and efficiently because of the genetic 
endowment of an individual or species, than it 
could have been acquired through learning or 
random variation; 

sez role—the learned conformity to behavioral 
standards for girls and boys in a given society ; 
concepi—rule relating a set of objects or ideas 
which share a common set of attributes and which 
often, but not always, share the same name in a 
language ; 

aggression—hostile, threatening behavior toward 
other organisms of one's own species; and. 
symbol—the signifier for a class of things, which 
(from about the age of two years) we differentiate 
from the objects for which the symbol stands. 


All 120 possible pairs were presented in a ran- 
dom-order, successive-intervals format which was 
the same for all subjects. In addition, 8 randomly 
selected pairs were repeated so that we might 
estimate the reliability for each subject. Subjects 
were asked to score each pair for similarity on a 
scale from 1 to 15. A score of 1 was to mean that 
the concepts were “virtually identical” and 15 that 
they were “as different as could be.” The subjects 
took from 25 to 55 minutes to complete this 
questionnaire, once at the beginning of the course 
and again at its end, 10 weeks later. The instruc- 
tor and his teaching assistant also completed the 
questionnaire. 


RESULTS 


The sample was split arbitrarily into 
“exploratory data” (n = 20) and “con- 
firmatory data" (n = 25), so that hy- 
potheses generated by inspection of one 
set of data could be confirmed using the 
uninspected data. Moreover, previous re- 
search (Berg & Wainer, 1973; Wainer & 
Berg, 1972) indicated that as few as 20 sub- 
jects appeared to yield stable estimates of 
item coordinates. Splitting the data allows 
Us to test this, 

Since eight item pairs were repeated in 
the questionnaire, each subject’s reliability 
could be estimated separately. Reliabilities 
varied considerably, but the median was 
Teasonably high: about .8 in both the pre- 
Course and postcourse administrations of the 
questionnaire; none was less than .5. 
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Fıcure 1. The structure of the data for the 


exploratory analysis. (Abbreviation: KK = 


Kenneth Kaye, the instructor.) 


Exploratory analysis. The input to IND- 
SCAL consisted of a three-dimensional 
matrix of judgments, items by items by sub- 
jects. This is most easily visualized as 42 
layers of interitem distance ratings. As 
shown in Figure 1, one layer is that of the 
instructor, another consists of the judgments 
of the teaching assistant, 20 layers come 
from the students before the course, and 20 
more come from the same students after 
the course. 

A three-dimensional solution was ac- 
cepted; the correlation between the data 
and the fit in three dimensions was .62, un- 
usually low for INDSCAL, but about as 
high as could be expected given the attenu- 
ating effect of within-subjects reliability. 
There were 2 three-dimensional structures 
to be examined: first, the spatial configura- 
tion of items and, then, the positions of in- 
dividual subjects within that space.* 

The matrix of coordinates in three-space 
for the 16 concepts is shown in Table 1. 
The first dimension has reflex, innate com- 
ponent, and basic trust at one end, gapped 
(Tukey, 1971, Chapter 18x; Thissen & 
Wainer, 1973) from the terms relating to 
older children and adolescents at the other 
end. We interpret this dimension as 
“infaney—childhood-adolescence,” granting 


* These are, of course, the duals of one another. 


bi 
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TABLE 1 
COORDINATES OF THE 16 CONCEPTS ON EACH OF THE 3 DIMENSIONS: EXPLORATORY ANALYSIS 
Coordi Dimension I Dimension II Dimension III 
nate 
5 . H . 
4 Conservation Alienation 
3 Formal operations, concept, | Aggression, reflex, regression 
symbol 
.2 | Formal operations, concept, | Schema Innate component 
alienation, symbol 
.1 | Conservation, regression, op- | Intention 
erant d " 
0 | Schema, socialization, sex | Innate component Conservation, symbol 
role, intention, aggression n k 
—0 | Attention Reflex, attention, socializa- | Concept, attention 
tion 
—.1 Operant basic trust, sex role | Intention, formal operations, 
operant, schema 
£2 Aggression Sex role 
—.3 | Basic trust Regression, alienation Basic trust 
ub Socialization 
—.5 | Innate component 
—.6 | Reflex 


that more advanced students would have 
placed concept and symbol somewhere be- 
tween infancy and childhood rather than 
with formal operations and alienation. 
The second dimension is also easy to 
interpret. Piagetian terms are at one end, 
Freudian terms at the other, and terms from 
learning theory fall in the middle. We call 
this dimension "Freud-Piaget." 
The third dimension was difficult to inter- 
pret initially; perhaps the reader should 
refer to Table 1 and attempt his own inter- 
pretation before reading further. The in- 
structor (Kenneth Kaye) was unable to 
name the dimension or to understand it with 
respect to his course. Table 1 was shown to 
two graduate students who had taken his 
course at the University of. Chicago as well 
as to a colleague in the Department of Edu- 
cation and Human Development. For Di- 
mensions I and II, their interpretations 
agreed with ours; Dimension III was 
labeled adjusted-maladjusted by one, pro- 
social-school-antisocial by another, and so- 
cialized-organismie by the third. These in- 
terpretations seem to be based primarily 
upon the two ends of the dimension. We 
shall give it the name "prosocial-antiso- 
cial.” Why did the instructor have difficulty 
interpreting this third dimension? 
In the previous section, we listed three 


dimensions which the instructor predicted 
would characterize his course. One of these 7 
corresponded exactly to Dimension I, which 
is, after all, an obvious dimension for D 
course entitled “Child Development." Di- 
mension II, Freud-Piaget, is similar to à 
predicted dimension except that the in- 
structor thought he conceived of libidinal 
theory as midway along a cognitive to be- 
havioral dimension. Dimension III, too, i8 
something like one of the predicted di- 
mensions, nature-nurture, which suggests 
that the instructor should have found it 
easy to interpret. The predictions had been 
set aside for eight months, however, an! 
were forgotten since they had been u$ 
only for reference in selecting stimulus items 
that would vary along some expected di- 
mensions. We could conclude so fat 
then, that multidimensional-scaling strut 
tures merely reflect the experimenters 
choice of stimuli. However, the dimension 
become interesting when we look at ho 
individual subjects weighted them. ail 
Figure 2 presents “box and whisker” plo : 
(Tukey, 1971) of the weightings of the E 
dents for each of the three dimensions. 


‘The “box and whisker” plots shown x M 
2 are a schematic way of displaying a- lods of 
tribution. The “*” indicates the extreme v 
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Dimension Dimension Dimension 

I I pun 
6 
5 | KK 
4| 
H 
.2 
| 

KK 


=A 


" Fraung 2. Box and whisker plots of the students' weightings of each of the obtained 
dimensions—an exploratory analysis. (Abbreviation: KK = Kenneth Kaye, the instructor.) 


any given subject on a particular day (be- 
fore or after the course), the three weight- 
ings tell us his position relative to the struc- 
ture of the whole class. This gives us a way 
of comparing individual subjects with the 
instructor and testing for before-after 
movement through this common, conceptual 
reference space. 

The weighting computed by INDSCAL 
for the instructor on Dimension I was .34, 
almost exactly the median of the class. On 
Dimension II, his weighting was .51, higher 
than any of the students. Although prior to 
the course he had not described the theo- 
Tetical dimension as having Freud at the 
Opposite end from Piaget, he used this di- 
mension most heavily in making the paired 
Judgments. On Dimension III, his weighting 
was —.03, lower than any of the students. 
He did not think of his course as presenting 
à prosocial-antisocial perspective; he did 


the distribution, read off the stem. The lines on 
the ends of the box represent the 25th and 75th 
percentiles of the distribution; thus the box 
encloses the middle 50% of the data. The middle 
line is the median. This method of data display is 
quite useful in that it gives the “flying eye" a 
Picture of overall level and spread simultaneously, 
and this allows easy comparison across groups. 


not use it in rating the concepts for similar- 
ity; and he could not even interpret Di- 
mension III when shown Table 1. 

Since Dimensions II and III were the ones 
on which the instructor and his student dif- 
fered, we plotted their weights in Dimension 
II against those in Dimension III (Figure 
3). In this figure, the subject identification 
numbers are shown along with the letter B 
or A for before or after. Evidently, 9 of the 
20 moved toward the instructor on both di- 
mensions, and another 10 moved toward him 
on either the Freud-Piaget dimension (giv- 
ing it greater weight after the course) or the 
prosocial-antisocial (giving it less weight). 
More technically, the “before” centroid is 
significantly further from KK (Kenneth 
Kaye, the instructor) than the “after” cen- 
troid (p < .05). 

Note that the graduate assistant per- 
ceived the concepts in a manner similar to 
that of the instructor, but still well within 
the range of the students. The fact that she 
graded the examinations may help to ex- 
plain our failure to get a significant correla- 
tion between final grade and movement to- 
ward the instructor. 

Confirmatory analysis. The data from 25 
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DIMENSION I 
Fioure 3. The weightings of the exploratory subjects on Dimension II and Dimension 


III before (B) and after 
Kenneth Kaye, the instructor.) 


additional subjects, set aside during the first 
phase of analysis, were now used as input 
to INDSCAL. The three-dimensional solu- 
tion had about as good a fit as before 
[r (data, fit) = 61]. The matrix of item co- 
ordinates was virtually identical to Table 1. 

| The correlations between exploratory and 
confirmatory samples, for the rank order of 
items along each dimension, were .97, 95, 
and .92 for Dimensions I, II, and III, re- 
spectively. Although our interpretation and 
labeling of these dimensions remains sub- 
jective, the great similarity between repli- 
cations of the analysis indicates that sample 
sizes of about 20 are sufficient to give highly 
reliable estimates of item coordinates and 
of dimensional structure in this application 
of multidimensional sealing. 

Examination of the projections of sub- 
ject scores in the Dimension I-III plane 
indicates that the instructor’s position is, 
once again, rather solitary, As before, move- 
ment toward him by the students is signifi- 
cant (p < .05). Sadly, the degree of move- 
ment appears to be unrelated to grade (r = 
.18), number of previous psychology courses, 
or sex. We are uncertain if this reflects badly 


(A) a course in developmental psychology. (Abbreviation; KK = 


on multidimensional scaling or on the 
method of course evaluation which deter- 
mined grades (see Figure 4). 


Discussion 

In this paper, we have utilized an india 
ual-difference model of multidimensiona 
sealing to get estimates of the effect of in- 
struction upon movement through a k- 
ceptual space. The results attest b 
power of Carroll and Chang's (1970) I 
SCAL model. The remarkable difference be- 
tween the instructor and his students in 
their use of the three major dimensions, 
which held in replication with the a 
firmatory sample, was unexpected i 
believable. The change in perception V 
class as a whole (as measured by the ) 
fore" and "after" centroids of weight 
shifting significantly toward the ine’ 
tors’s own view of his an! was 
modest but very important effect. — " 

A subsequent study in which the instru 
tor introduces relevant dimensions E ld 
of the explicit content of his course § p 
achieve more drastie changes in be d 
dents’ conceptual structures. If such a 
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FiavnE 4. The weights of the confirmatory subjects on Dimension II and Dimension III 
before (B) and after (A) a course in developmental psychology. (Abbreviation: KK — Ken- 


neth Kaye, the instructor.) 


found less overlap between the “before” 
and "after" data points, it might be good to 
construct a separate multidimensional-scal- 
Ing space from each of the two sets of data. 
In the present experiment, however, the 
overlap indieates that the first three di- 
Mensions used remained the same at the end 
of the course. Carroll and Chang's proce- 
dure enables us to trace a shift in the rela- 
tive importance attached to those dimen- 
Sions by the students. Infancy-childhood— 
adolescence was an obvious continuum to 
keep in mind when judging concept pairs. 
On the less obvious second and third di- 
Mensions, the course apparently influenced 
the students to regard the theoretical load- 
Ing of a concept as more important, and the 
Prosocial-antisocial loading as less impor- 
tant than they had done initially. 

Another goal for future research in this 
area is to discover or produce a relationship 
between a student's movement toward the 
Instructor and the subjective grade he re- 
Celves: The implications of such a relation- 
ship are obviously complex, and it is not 
clear that the best students are those who 
assimilate their professor’s biases. It is still 


too early to advocate multidimensional 
scaling for assessing students, but it does 
yield useful insights for the instructor. 

In addition to refining the present appli- 
cation of multidimensional scaling, we hope 
to explore others. Here we have measured 
individual perceptions of, and changes in, 
the underlying structure of a course; we 
have used the instructor’s position in the 
same dimensional space as a criterion mea- 
sure for individual subjects. A somewhat 
different application would be in the study 
of attitude change. In the research of Sherif 
and Hovland (1961), the extent to which a 
person is influenced by an argument is 
curvilinearly related to the initial distance 
between his own attitude and the presented 
one. The individual-difference multidimen- 
sional-sealing procedure makes it possible 
to view each person within an attitude 
space, including the person attempting to 
influence the others. Thus, the assimilation 
and contrast effect can be seen multidimen- 
sionally. A person whose attitude is near 
the target on one dimension might be far 
from it on another; theory predicts he 
should be positively influenced by the argu- 
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ment in the dimension on which he was 
relatively near it and yet move away from 
it on the other dimension. Even more im- 
portant than testing this hypothesis, how- 
ever, is the question of how two or more 
cognitive dimensions will interact, so that 
persuasion on one dimension may produce 
positive change along several, wiping out 
the predicted contrast effects. Similarly, 
aversion to an argument along one dimen- 
sion (racism, for example) may preclude 
being influenced by it along other dimen- 
sions. 

Multidimensional scaling has not existed 
long enough to have produced any longi- 
tudinal data as yet, but the developmental 
applications of this individual-difference 
model are very promising. Suppose, for 
example, that we administered a multidi- 
mensional-scaling questionnaire to pre- 
adolescents and their parents, using as stim- 
ulus items, pictures or descriptions of family 
roles. The same subjects could be tested 5 
and 10 years later; children should change 
more than adults, in the direction of their 
own parents more than in a chance direc- 
tion, toward the same-sexed parent more 
than the other, and so forth. 
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SOME PROBLEMS IN THE APPLICATION OF ACHIEVEMENT 


CHARLES B. SCHULTZ? 


MOTIVATION TO EDUCATION: 


THE ASSESSMENT OF MOTIVE TO SUCCEED AND 
PROBABILITY OF SUCCESS! 


Trinity College 


Two objective measures of motive to succeed, one used by Hermans 
and the other by Mehrabian, were administered to 93 subjects. The 
preferences of motive to sueceed (Ms) and motive to avoid failure 
(Mar) subjects for academic tasks which varied in difficulty were com- 
pared using individual and group standards for determining proba- 
bility of success (Ps). The biasing effect of Ms and Mar on Ps was 
also examined. Both Ms instruments were multidimensional; they cor- 
related with internal achievement responsibility for success and mea- 
sures of academic achievement; and they were significantly related to 
each other. Hermans was the more reliable test. The Ms and Mar 
risk preferences were most consistent with theoretical expectations 
when group standards of Ps were used. Overestimation of Ps was 
directly related to Ms. Both Ms and Mar subjects overestimated Ps 
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University of Connecticut 


more on difficult tasks than on easy tasks. 


Achievement, motivation as conceived by 
Atkinson (1964, 1966) results from the inter- 
action of three primary components, one 
situational and two personality variables. 
Individuals may be characterized by the 
relatively stable traits of the motive to 
Succeed (Mg) and the motive to avoid failure 
(May). For some persons, one or the other 
trait is dominant. Thus, a person may be 
basically success oriented (Ms > Mar) or 

, failure threatened (May > Ms). Both per- 
Sonality types face tasks which vary in diffi- 
culty and for which they make subjective 
Judgments of their probability of success 
(Ps). In addition to the primary constructs 
ee 
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duct and analyses of this research. They are also 

»indebted to Robert K. Gable of the University of 
Connecticut for his assistance and advice regard- 
Ing the factor analyses and to Charles Clock, Ed- 
pd Bierman, and Francis Whittle of the West 

artford School District for their cooperation. 
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zom Trinity College, Hartford, Connecticut. 
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i Schultz, Department of Education, Trinity Col- 
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of Ms, Mar, and Ps, three derivatives of Ps 
are assumed to account for achievement 
motivation. These include the incentive 
value of success (Is or 1 — Pg), the prob- 
ability of failure (Py or 1 — Pg), and the 
incentive value of failure (Ip or — Ps). 

The erux of achievement motivation 
theory is the notion that individuals who are 
highly success oriented relative to their 
fear of failure prefer tasks of moderate Ps, 
while persons with the opposite tendencies 
prefer either extremely easy or extremely 
difficult tasks. Application of this potentially 
useful theory to classroom instruction has 
been hampered by at least two assessment 
problems, the measurement of Ms and of 
Ps. These problems are complicated by the 
fact that risk-taking behavior across tasks 
is not highly correlated (Weinstein, 1969); 
therefore, findings using games and other 
tasks may not be generalizable to academie 
work. 

Measurement of Ms, a primary problem 
facing educational application of the theory 
(Atkinson, 1965; Maehr & Sjogen, 1971), 
is typically determined by projective tech- 
niques such as the Thematie Apperception 
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Test and the French Test of Insight. Objec- 
tive versions of tests of Ms are needed not 
only to increase convenience in adminis- 
tration and scoring but to improve upon the 
internal consistency and the test-retest 
reliabilities of the projective measures which 
have been low (Clarke, 1973; Weinstein, 
1969). Weinstein's (1969) examination of 
self-report instruments found few relation- 
ships among the tests, little relationship 
between objective and projective instru- 
ments, and an inability of virtually all meas- 
ures to predict risk-taking behavior of Ms 
and Ma, subjects on a variety of tasks. The 
present study is an attempt to extend Wein- 
stein’s (1969) research by assessing the 
reliability (internal consistency) of two more 
recent objective measures of Mg by Her- 
mans (1970) and Mehrabian (1969) and by 
assessing their power to predict the risk- 
taking behavior of Mg and M4r subjects on 
tasks that are characteristic of schoolwork. 
The Mehrabian instrument, in particular, 
was selected because of successes reported 
with it (Farley & Mealiea, 1973; Weiner, 
Johnson, & Mehrabian, 1968; Weiner & 
Potepan, 1970; Wolk & DuCette, 1973). 
Assessment of Ps has not received the 
systematic attention given to the measure- 
ment of Ms, yet it is just as important, par- 
ticularly because other factors in the theory 
are derived from it. There are at least two 
problems associated with the determination 
of Ps. The first deals with the manner in 
which Ps is estimated. Atkinson (1964) con- 
ceived of Pg as an individual, subjective 
judgment. In actuality, combinations of the 
various estimates of Ps have been used in re- 
search on achievement motivation. Isaacson 
(1964), for example, used a ratio of average 
grades in a course to average intelligence of 
the students, implying an objective, group 
estimate of Ps which is, in effect, the opposite 
of Atkinson's (1964) suggestion. A similar 
approach was used by deCharms and Car- 
penter (1968), who defined Ps by the diffi- 
culty level of arithmetic and spelling items. 
Frequently, Ps is induced by informing 
subjects of the number of others like them- 
selves who have been successful on a task 
(Weiner, 1972); this also implies a group, 
objective standard. Weinstein (1969) used 
what amounted to both individual-objective 
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and individual-subjective estimates. 1 
present study is an attempt to determ 
which, if any, combination of task-spe¢ 
estimates of Ps approximates the theoreti 
expectations derived from Atkinson’s theo 
A second problem in the assessment of] 
is the tendency to bias estimates of Pg whi 
arise from the lack of independence betwe 
Mg and Ps. Atkinson (1964) notes that pe 
sons who are high in M, are predisposed! 
overestimate P (i.e., perceive their chang 
of success as greater than objective diffi 
level would suggest) as compared to tho 
who are low in Mg or who are anxious abot 
failure. Thus, a general biasing across 
levels of Pa is implied. Feather (1965) 
vanced two hypotheses which suggest] 
more complex relationship. The first is thi 
overestimation of Pa by Ms subjects a 
underestimation of Pa by Mar subjects wi 
be greatest when the tasks are moderate 
difficult, that is, when Ps = .50. Feat 
(1965) contention is that since Mg pe 
have been most successful at that diffi 
level and May persons have been relat 
unsuccessful at that same level, the 
groups approach moderately difficult 
in particular, with different expectations. 
The second hypothesis is based on UE 
assumptions that (a) overestimation 
is a function of the subjective attractiv 
of success (Ms X Is) and (b) underesti 
tion is the result of the subjective repulsive 
ness of failure (Mar X I+). For persons ^| 
whom Ms > May, overestimation of Ps 
greatest when I, is high and, therefore, P 
is low. When May > Ms, underestimatit? 
of Ps is greatest when Ip is high and, ther 
fore, Ps is high. Accordingly, the altem 
hypothesis holds that the underestimation 
of Ps by Mar subjects is greatest 0n a 
tasks and that there is little, if any, UP : 
estimation on difficult tasks, while the o 
estimation of Ps by Ms subjects follows 
opposite trend. The design of the PI^. 
study provides an opportunity to pe 
both of Feather's hypotheses and the iP 
positive relationship d Ms an 
suggested by Atkinson (1964). i 
In summary, the purposes of the br" 
study are to assess the reliability °° to 
validity of two objective measures P ^ 
compare several ways of estimating ^9 


ACHIEVEMENT MOTIVATION AND EDUCATION 


to examine the biasing effect of Ms on Ps. 
Since an overriding concern of this study is 
the application of achievement motivation 
to education, academic tasks were used with 
a junior high school population. 


METHOD 


Subjects 


The subjects were 93 male ninth graders who 
were randomly drawn from two suburban junior 
high schools. For the analyses of variance, the 
top quartile (n = 23) of each Mg scale (or resultant 
measure) defined the Ms > Mar group for that 
scale (hereafter refered to as Ms subjects) and 
the bottom quartile defined the Mar > Ms group 
(hereafter refered to as Mar subjects). The other 
analyses were based on all subjects. 


Instruments and Apparatus 


Achievement measures were obtained as a 
part of the regular school testing program two 
months before the present study was conducted. 
Some of the personality measures were admin- 
istered in their original form, while others were 
modified to meet the reading level and experience 
of ninth graders as determined by pilot work with 
the tests. Instruments to assess Ps and task at- 
tractiveness or preference (Ta) were constructed 
expressly for the purposes of the present study. 


Measures of M, 


Two measures of Ma were used. The first was 
Mehrabian’s (1969) short form of the Resultant 
Achievement Motivation Test. The male scale 
was altered as follows: four items were modified 
to make them less complex and, therefore, easier 
for ninth graders to read, and four items were 
dropped because of inappropriate content for the 
present population. Thus, the final version of 
the male scale consisted of 22 items. Hermans’ 
(1970) 29-item Prestatie Motivatie Test had minor 
changes in verb tense and vocabulary. 

Measure of Map. The Debilitating Anxiety 
subscale of the Achievement Anxiety Scale by 
Alpert and Haber (1960) served as a measure of 

Ar. Seven of the 10 items of this subscale were 
Changed. These modifications were slight and 
Were limited to words which were difficult for 
Junior high subjects as determined by pilot work 
with the test. 

3 Measures of Ps and T4. The Comprehensive 
Test of Basic Skills was administered two months 
before the present test battery. Based on that 

ministration, 21 items were selected which 

Varied in difficulty. Twelve items tested English 

Skills and 9 tested mathematics. The 21 items 

Were arranged into three categories of item diffi- 

culty based on the scores of ninth graders at the 

two schools. Seven were in the 20%-30% right 

Tange; 7 in the 45%-55% right range; and 7 in the 
_ 70%-80% right range. The items were arranged 
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in sets of 3; each set tested similar skills (e.g., 
mathematics work problems) and contained 1 
item from each level of difficulty. The set was 
enlarged to 5 items with the addition of an ex- 
tremely easy problem (e.g., 9,605 — 4,847) and a 
problem that would be difficult for college stu- 
dents (e.g., if dy/dz = 4(z* + 2)*, what does (d'y/ 
(dz) = ?). This procedure resulted in a total of 
35 items, 20 in English and 15 in mathematics, 
which were broken down into six sets of 5 items. 
Each set contained similar items with 1 item 
representing each of the five levels of difficulty: 
0-10, 20-30, 45-55, 70-80, 90-100. The items com- 
prising each set were printed on a separate sheet 
of paper in a random order. The test booklet con- 
tained four pages of English items followed by 
three pages of mathematics items. The booklet 
was presented to the subjects twice, on the first 
occasion to measure Tą and on the second to 
measure Ps. 

Five-point scales were used for both Ps and 
T4 measures. The Ps scale reflected the five cat- 
egories of the objective difficulty of the items. 
It ranged as follows: (a) I would definitely get it 
right (9095-1009; sure of being correct); (b) I 
would probably get it right (70%-80% sure); (c) 
I could get it right or wrong (45%-55% sure); 
(d) I would probably get it wrong (20%-30% sure) ; 
and (e) I would definitely get it wrong (0%-10% 
sure). The T, scale was as follows: (a) would 
definitely not want to do problem, (b) would not 
like to do problem, (c) don't care about doing 
problem, (d) would like to do problem, and (e) 
would definitely want to do problem. Thus, the 
Ta scale provides for both approach and avoidance 
tendencies. 


Estimates of P, 

Three task-specific estimates of Ps were used 
as follows: objective group, subjective group, 
and subjective individual. The objective-group 
estimate was based on the percentage of students 
in the two schools who got the items right on the 
Comprehensive Test of Basic Skills. A priori 
judgments were made on the extremely easy or 
extremely difficult items as described above; 
Subjective-group estimates were the subjects 
average ratings of Ps for each item. Subjective- 
individual estimates were simply the Pg ratings 
given by the subject for each item. 


Procedure 


The seven tests were administered in 2 one-hour 
sessions. During the first session, Mehrabian's 
(1969) Resultant Achievement Motivation. Test, 
both Facilitating and Debilitating subscales of 
the Achievement Anxiety Scale (Alpert & Haber, 


? A listing of the Prestatie Motivatie Test items 
according to their subscale membership and with 
their factor loadings and classifications according 
to Hermans’ (1970) a priori analyses is available 
from the first author upon request. 
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1960), and a 46-item school attitude and behavior 
questionnaire were administered, in that order. 
Two weeks later, the following tests were given: 
the measure of Ts, the Prestatie Motivatie Test 
(Hermans, 1970), the Intellectual Achievement 
Responsibility Questionnaire (Crandall, Katkov- 
sky, & Crandall, 1965), and the measure of Ps. 

The tests were administered to groups of ap- 
proximately 25. Instructions to Ps and Ta sections 
cautioned the subject not to work the problems 
but to estimate the attractiveness and later the 
difficulty of each item. The Ps instructions alerted 
the subject to the range of difficulty included in 
the items. The Ta instructions asked the subject 
to imagine that he was selecting items for a test 
which no one else would see but which would show 
him how good he was in English and mathematics. 
Thus, an attempt was made to establish an evalua- 
tive situation which minimized extrinsic factors 
such as grades. 

All instruments except the measures of Ps and 
Ta and the school attitude and behavior ques- 
tionnaire were administered via 2 X 2 inch slides. 
Subjects read each item projected on the screen 
as they listened to a tape recorded reading of the 
item by the experimenter. The timing of the items 
was determined by pilot work at a different junior 
high school and was controlled by an audiosyn- 
chronizer device. 


ReEsuLts 
Assessment of M s 


The internal consistency estimate of re- 
liability (@ coefficient) for the Resultant 
Achievement Motivation Test (Mehrabian, 
1969) was .55 and .91 for Hermans' (1970) 
Prestatie Motivatie Test, implying consider- 
ably more consistency for the latter measure. 

Both tests were factor analyzed and the 
reliabilities of subscales derived from those 
analyses were determined. The analysis of 
the Resultant Achievement Motivation Test 
yielded 10 factors with eigenvalues greater 
than 1.00. These factors accounted for 70% 
of the total variance, while the first factor 
alone accounted for 12%. A varimax rotation 
of the 10 factors yielded 3 factors, each con- 
taining three items with factor loadings of 
.40 or higher, 5 factors containing two items, 
and 2 specific factors containing only one 
item. One item did not load on any factor 
according to the .40 criterion. The three-item 
factors were identified as subscales with 
rubrics similar to those used by Mehrabian 
(1968). The first (œ coefficient = .57) re- 

flected independence in two of the items 
(preference for working alone and for indi- 
vidual sports or games). The status of the 
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third item in this subscale (preference fe 
book over a movie) was somewhat amb 
ous. The second subscale (o coefficient = 
was identified as pride in accomplishm 
due to preferences for doing things well 
for love of winning. An item which tapi 
future orientation was also included. 4 
third subscale (a coefficient = .52) reflee 
choice of achievement activities by p 
ences for games few people know, for dil 
cult, thought games, and for being in bi 
ness for oneself. 

The Prestatie Motivatie Test (He 
1970) was analyzed into nine factors W 
eigenvalues greater than unity; these 8 
counted for 70% of the total variance. TI 
first factor accounted for 34% of the 
ance. Five subscales that had more 
three items with factor loadings of at lea 
.40 were obtained from the varimax rotati 
of the nine factors. The match betw 
Hermans’ (1970) a priori judgments of 
items and the subscale membership obtain 


four of the five subscales. Some subscal 
(e.g., Subscales 2 and 5) contain no duplici 
tion of item categories and almost de 
classification, while others (e.g. Subscalé 
3 and 4) reveal considerable overlap between 
Hermans' classification scheme and 
factor analysis results. The first subse 
contains all of the task-tension items and all 
but one of the aspiration-level items. À 
In summary, the Resultant Achievemem 
Motivation Test (Mehrabian, 1969) is con 
plex and multidimensional. It was analy 
into three subscales which reflect recogs 
able aspects of achievement motivati ni 
However, the internal consistency of the tes 
as a whole was relatively low, as Welt 1 
reliabilities of the subscales. Moreoveh die 
of the 22 items did not load on factors Y 
ing subscales which attained even ^ 
modest level of reliability. The P m 
Motivatie Test (Hermans, 1970) has ai 
internal consistency, and at least three y 
scales are reliable. Twenty-two of the 
items are included in the five SU am 
However, there is an apparent lack dash 
sistency between Hermans’ original inel | 
cations and some of the subscales de 
by the present analysis. 
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Intercorrelations among personality, abil- 
ity, and achievement factors are listed in 
Table 1. The Resultant Achievement Mo- 
tivation Test —Debilitating Anxiety sub- 
scale was included as an index of resultant 
motivation because of Weiner and Potepan’s 
(1970) finding of only a slight relationship 
between the Resultant Achievement Moti- 
vation Test and test anxiety (r = —.13), 
and their subsequent argument that the 
Resultant Achievement Motivation Test is 
in actuality a measure of Ms alone. Hermans’ 
(1970) Prestatie Motivatie Test, of course, 
was treated only as a measure of Ms. 

The correlations on Table 1 permit several 
conclusions. First, if the Resultant Achieve- 
ment Motivation Test is a resultant measure 
as Mehrabian (1968, 1969) intended it, it 
' should correlate negatively with test anx- 
iety; if it is a measure of Mg alone, no 
correlation is expected. The significant but 
modest correlation of —.24 is somewhat 
equivocal in this respect, and it is not sub- 
stantially different from earlier correlations 
(r = —.16 and —.26) obtained by Mehra- 
bian (1968, 1969). In view of this, and par- 
ticularly because there are, for the most 
part, only trivial differences in the correla- 
tions between the Resultant Achievement 
Motivation Test and the Resultant Achieve- 
ment Motivation Test- Debilitating Anx- 
lety subscale with other factors, in subse- 
quent analyses and discussions, the Result- 
ant Achievement Motivation Test will be 
treated as a resultant, measure. Second, all 
measures of Mg correlate with total internal 
achievement responsibility and, in particu- 
lar, with internal achievement responsibility 
for success. No substantial correlations were 
obtained with internal achievement responsi- 
bility for failure, Third, all Ms indices cor- 
relate strongly with achievement, particu- 
larly when measured by grade point aver- 
ages, 

The Resultant Achievement Motivation 
Test correlates with the Prestatie Motivatie 
Test- Debilitating Anxiety subscale (r — 
:40, p < .001) somewhat more strongly than 
Mehrabian (1969) reports its relationship 
With the Thematic Apperception Test — Test 
Anxiety Questionnaire (r = .29, p < -01), a 
More traditional measure of resultant mo- 
tivation. Both in spite of this and the simi- 
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TABLE 1 
ConnELATIONS OF Mg anp Ms — Mar WITH 
PERSONALITY, ACHIEVEMENT, AND ABILITY 
Tests ron MALES 


Measures of Mg and Mg — May 


Measure 

RAM |RAM-DAS| PMT |PMT-DAS 
DAS —.24 —.79*  |—.28 —.B80" 
IAR-S cdm PT, cese LIE Lagen] n baie! 
IAR-F |—.02 .00 24 .16 y 
IAR-T .29** .32** :d2*** S paves 
IQ .29** .98*** | .20 .31* 
CTBS ACT aS aT 42r 
GPA .48*** | .51*** | .56*** | .5ze** 


Note. Abbreviations: Ms = motive to succeed; 
Mar = motive to avoid failure; DAS = Debilitat- 
ing Anxiety subscale; IAR-S = Intellectual 
Achievement Responsibility, Success; IAR-F = 
Intellectual Achievement Responsibility, Failure; 
IAR-T = Intellectual Achievement Responsi- 
bility, Total; CTBS = Comprehensive Test of 
Basic Skills; and GPA = grade point average for 
the semester data were collected. 

* Correlation of a composite measure with one 
component. 

* p « .05. 

** p « 01. 
ss? p <A; 


larities between the two Ms measures evi- 
denced in Table 1, the Prestatie Motivatie 
Test can be distinguished by stronger cor- 
relations with measures of locus of control 
(Intellectual Achievement Responsibility 
Questionnaire, Total), approaching the re- 
lationship of.r = .64 reported by Mehrabian 
(1968). In addition, the Prestatie Motivatie 
Test correlates only slightly with IQ and 
standard achievement tests (Comprehensive 
Test of Basic Skills). The first relationship 
is expected (Weiner, 1970, p. 219), and the 
second is not surprising given the strong 
correlation between IQ and the Compre- 
hensive Test of Basic Skills (r = .77, p < 
.001). However, there is a pronounced rela- 
tionship between the Prestatie Motivatie 
Test and grade point average, an achieve- 
ment measure which may be more sensitive 
to motivational influences than standardized 


tests. 


Assessment of Ps 
Analyses were conducted to determine 


which estimate of Pg most closely approxi- 
mates theoretical expectations and to deter- 
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are displayed in Figure 1 for each Pg esti- 
mate: group objective, group subjective, 
and individual subjective. Five levels of Ps 
(0-10, 20-30, 45-55, 70-80, 90-100) were 


L 


levels as expected (according to Newman 
Keuls stepwise comparison procedures b 
p = .05); however, instead of rising 8575 
continues to drop, Mar risk preferences 
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(a) (b) (o 
Ficure 1. Preferences (T4) of motive to succeed (Ms) and motive to avoid failure (Mar) 
subjects for problems with different levels of probability of success (Ps). (In Figure 1a, the 
Ps levels are as follows: 1 = 0-10, 2 = 20-30, 3 = 45-55, 4 = 70-80, and 5 = 90-100; in 
Figures 1b and 1e, the levels are as follows: 1 = 0-10 and 20-30, 2 = 45-55, and 3 = 1030 
and 90-100. Abbreviations: RAM = Resultant Achievement Motivation Test and PMT- 
DAS = Prestatie Motivatie Test - Debilitating Anxiety subscale.) 
mine the relationship between Ps and Mg. comparisons of the means, the due 
In regard to the first of these questions, a between Ms and M AF. subjects is grea 
repeated measures analysis of variance of the extreme and particularly at the diffi 
Ta scores was used to test the hypothesis Ps levels, although even there the differences 
that Mg and Ma, differences were greatest are less than reliable. In addition, Me : 
when Ps = .50. The results of these analyses preferences drop as Ps decreases to modera 
l 


used for the group objective. For the remain- 
ing estimates, the extreme categories were 
collapsed, resulting in three Ps levels (diffi- 
cult = .05 and .25; moderately difficult = 
.50; and easy = .75 and .95). 

The 2 X 5 analysis of variance of T, scores 
using group-objective estimates of Ps (Figure 
la) yielded a significant main effect for Ps 
(F = 16.40, df = 4/176, p < .001) on the 
Prestatie Motivatie Test- Debilitating Anx- 
iety subscale and on the Resultant Achieve- 
ment Motivation Test (F = 17.22, df = 
4/176, p < .001). The same analysis yielded 
a significant Personality Type X P, inter- 
action for the Prestatie Motivatie Test — 
Debilitating Anxiety subscale (F = 4.51, 
df = 4/176, p < .01) and the Resultant 
Achievement Motivation Test (F = 8.10, 
df = 4/176, p < .001). Although the Per- 
sonality Type X Ps interactions are signifi- 
cant on both Mg measures, the curves clearly 

depart from the theoretical model in several 
important respects. According to pairwise 


slide from Pa = .50 to Ps = .00-.10 (P e 
.05). The risk preferences of Ms subjects _ 
could be described as an extremely er 
inverted U shaped function. The c 
three Ps levels are higher than the hai 
level (p < .05) and the easiest (p > * 
one, at least with the Resultant Achievem 
Motivation Test. m 
p). AS 


subjective estimates of Ps (Figure Ib). £ 


ent 


(F = 18.02, df = 2/88, p < 00h for 
Resultant Achievement Motivati orth 
F = 12.30, df = 2/88, p < 001, E 
Prestatie Motivatie Test- Debilitating ci 
iety subscale). Again, there were siea with 
Personality Type X Ps interac a 988 
both Ms instruments (F = 11.84, df "d 

p « .001, for the Resultant Acbie 9/8 
Motivation Test; F = 5.02, d ác Tel" 
p < .01, for the Prestatie Motiva 
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Debilitating Anxiety subscale). The same 
trends occurred here as noted earlier with the 
group-objective estimates. The only differ- 
ence between personality types was at the 
difficult Ps level on the Resultant Achieve- 
ment Motivation Test (¢ = 1.70, df = 88, 
p < .05). According to the Newman-Keuls 
procedure, Mg subjects do not differ across 
Ps levels; however, there is a significant 
(p = .05) increase in Mar risk preference at 
each Ps level. 

The analysis using the individual-subjec- 
tive estimate of Pg (Figure 1e) yielded sig- 
nificant main effects for Ps on the Resultant 
Achievement Motivation Test (F = 23.00, 
df = 2/88, p < .001) and on the Prestatie 
Motivatie Test — Debilitating Anxiety sub- 
scale (F = 18.72, df = 2/88, p < .001). 
No interaction effects were obtained. 

The curves in Figure 1 and these analyses 
can be summarized as follows: 


1. Measuremetit of Mg — Mayr with the 
Resultant Achievement Motivation Test 
and the Prestatie Motivatie Test — Debili- 
tating Anxiety subscale produces nearly 
identical results. 

2. On both group estimates of Ps, there is 
a direct relationship between Mar and Ps, 
While Ms does not differ across Pg or it is 
depressed at the lower Pg limits. 

3. There is a general, but nonsignificant, 
tendency for the Mg and Mar curves to 
cross at the extremely easy Ps level. 

4. When individual-subjective estimates 
of Ps are used, the T, scores of both Ms and 
May subjects are directly related to Ps, a 
finding which departs most sharply from the 
theoretical model. 


. The influence of Ms on subjective Ps 
judgments was assessed in two ways. The 
Ps ratings were summed across all 35 prob- 
lems and were correlated with the measures 
of Ms and Mg — Mar. The Resultant 
Achievement Motivation Test and the Pre- 
Statie Motivatie Test both are significantly 
(p < .01) related to total Ps (r = —.38 and 
— 41, respectively). Correlation of total Ps 
With the Prestatie Motivatie Test- De- 
bilitating Anxiety subscale is slightly 
Stronger (r = —,48). These findings imply a 
general tendency for persons who are high 
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in Ms and particularly those who are also 
low in M4r to overestimate Pg more than 
persons with the opposite predispositions. 

In order to determine whether the biasing 
of Ps by Ms and Mar differed across the Pg 
dimension, an objective-subjective Pg dis- 
crepancy score was analyzed via a 2 X 5 
analysis of variance with repeated measures 
on the second factor (Ps). The discrepancy 
score was computed by summing the devia- 
tions between group-objective Pg and the 
subject's Pg rating for the seven problems in 
each of the five Ps categories and then adding 
100 to each category sum. Thus, a score of 
100 for a Ps category indicated that the sub- 
ject's subjective Pg rating matched the 
group-objective difficulty of the items. Scores 
of less than 100 indicate overestimation of 
Ps, and larger scores indicate underestima- 
tion. 

Analyses of variance of discrepancy scores 
were conducted using the Resultant Achieve- 
ment Motivation Test and the Prestatie 
Motivatie Test - Debilitating Anxiety sub- 
scale. The results were nearly. identical; 
those based on the latter measure of Mg are 
described below and are represented in 
Figure 2. The analysis of discrepancy scores 
yielded a highly significant main effect for 
item difficulty (F = 238.91, df = 4/176, 


110 


100 


DISCREPANCY SCORES 


Tinute till irr 


DIFFICULT EASY 
GROUP-OBJECTIVE P, 


FraunE 2. Discrepancy scores of motive to 
succeed (Ms) and motive to avoid failure (Mar) 
subjects at different levels of probability of suc- 
cess (Ps). (Discrepancy scores over 100 indicate 
overestimation of Pg and scores under 100 indicate 
underestimation of Ps. The Pg levels are as fol- 
lows: 1 = 0-10, 2 = 20-30, 3 = 45-55, 4 = 70-80, 
and 5 = 90-100.) 
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p < .001) in which overestimation of Ps was 
greatest when objective Pg was low. Accord- 
ing to the same analysis, there was signifi- 
cantly less overestimation of objective Ps by 
Mar than by Ms subjects (F = 47.46, df = 
1/44, p « .001) and a significant Personality 
Type X Ps interaction (F = 8.22, df = 
4/176, p < .001). 

Both personality types tend to be most 
accurate when the task is easier, and most 
prone to overestimate when Pg is low, al- 
though the point of accuracy (100) is closer 
to the easier extreme of the Pg scale for Mg 
subjects. Moreover, Mar subjects under- 
estimate Pg on very easy tasks (Ps = 90-100) 
compared to Mg subjects (t = 2.24, df = 44, 
p « .05), while Mg persons overestimate Ps 
on moderately difficult (Ps = .50) and diffi- 
cult tasks (Ps = 20-30) significantly more 
than Mar persons (t = 2.89, df = 44, p < 
.05, and t = 2.65, df = 44, p < .05, respec- 
tively). 


Discussion 


In many respects, the Resultant Achieve- 
ment Motivation Test (Mehrabian, 1969) 
and the Prestatie Motivatie Test (Hermans, 
1970) perform similarly and satisfactorily 
with a junior high school population. The 
relationship between the two is relatively 
strong compared to correlations among ob- 
jective Ms measures reviewed by others 
(Clarke, 1973; Weinstein, 1969). Both of 
these instruments as well as the resultant 
version of the Prestatie Motivatie Test cor- 
relate positively with locus of control and, 
in particular, with internal achievement re- 
sponsibility for success, implying a measure 
of validity for both tests (Hermans, 1970; 
Mehrabian, 1968; Weiner & Potepan, 1970). 
Moreover, the inverse relationship with test 
anxiety is significant but slight; on the other 
hand, positive correlations are substantial 
with measures of achievement. 

Both instruments are multidimensional as 
is to be expected of measures of Ms. The 
number of factors and variance accounted 
for by the first factor of the Resultant 
Achievement Motivation Test compare 
closely with Mehrabian’s (1968) factor 
analysis of the earlier, 34-item version. The 
present factor analysis of the Prestatie Mo- 
tivatie Test provided an opportunity to 
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match Hermans’ a priori factors with en 
pirically derived factors. The success of th 
matching was clearly uneven. The Resultay 
Achievement Motivation Test is marred b 
relatively low internal consistency which, | 
our knowledge, had not been assessed prey 
ously. Mehrabian (1968) has reported high 
reliability when stability over time was use 
as an estimate. The internal consistency t 
the Prestatie Motivatie Test obtained in tli 
present study is high and slightly strong 
than that reported by Hermans (1970 
Moreover, the Prestatie Motivatie Test ha 
the advantage of more stable subscale 
Thus, the present findings related to th 
reliability and validity of the two objectiv 
measures are cause for guarded optimism Ie 
garding their use with school children, par 
ticularly in view of the inadequacies of othe 
objective instruments (Weinstein, 1969). 
The relationship between Ms and Mj 
and Pg is not in complete accord with theo 
retical expectations (Figure 1), a point tob 
discussed later. Nevertheless, the closer áp 
proximation to expectations for the M 
persons when group standards of Pg are 
is particularly interesting. Achievement mo 
tivation may be susceptible to social infli 
ences from gamelike tasks which generati 
social comparisons and competition (Mae 
& Sjogen, 1971). In respect to the behave 
of Ms subjects, at least, social influencesmb 
extend to the definitions of Ps and, acoord 
ingly, to Is and Ip. Thus, Ps may bel V 
defined by standards of difficulty set 4 
others, and Is and Ip defined by pride 0 
shame in accomplishments due to how 
subject feels he is perceived in the eyes 
others. An excellent ringtosser may ™ 
considerable pride in making a shot p cul 
relatively easy for him but markedly di fel 
for others, while a poor ringtosser may a 
a great shame in missing a shot whichis 
for others but difficult for him. The bjeli 
approximation of the model by Ms 8U ‘ist 
using group definitions of Ps arid 
social influences such as those d | 
above are not extrinsic to € PY 
motivation but, are an integral part ad 
least for ninth graders. The present 
with adolescents underscore Birney v, 
dick, and Teevan's (1969) concern ia 
sharper delineation, of individual an! 
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factors in determining Ps and suggest an 
investigation of developmental changes in 
the relative contribution each factor makes 
to the subjective definition of Ps. 

Even when group standards are used to 
estimate Ps, the obtained curves (Figures 
la and 1b) depart from the theoretical 
model and from other findings (deCharms 
& Carpenter, 1968; Isaacson, 1964) in at 
least three respects: (a) May subjects avoid 
difficult tasks; (b) Mar subjects do not avoid 
moderately diffieult tasks; and (c) Mg sub- 
jects’ preference for Ps = .50 tasks is not as 
pronounced as expected. Unlike deCharms 
and Carpenter (1968), the procedures of this 
study did not suggest a gamelike task nor 
did they build in an inverse relationship 
between Pg and Ig. In addition, extrinsic 
factors such as the importance of grades may 
have been more at play in the present study. 
Even so, the failure of M4r subjects to avoid 
moderately difficult tasks and to approach 
extremely difficult tasks has been observed 
on other occasions (Maehr & Sjogen, 1971). 
The clear-cut positive relationship between 
T, and Ps for May persons suggests that 
they may have treated the problems as 
matters of chance. The strong correlation 
between Mg and internal achievement re- 
sponsibility for success, in particular, sug- 
gests this may be the case. 

Atkinson and Feather (1966) view the in- 
fluence of Ms and May on Pg as a critical 
problem for achievement motivation theory. 
It is certainly an important issue in the 
application of that theory to educational 
problems. The significant, positive correla- 
tion between Ms and Ps confirms the lack 
9f independence between these two factors 
When it comes to initial judgments of task 
difficulty (Atkinson, 1964, 1966; Brody, 
1963; Moulton, 1965). The results of the 
analyses of variance also indicate greater 
Overestimation of Ps by Ms than by Mar 
Subjects, an expected finding given the posi- 
tive correlation between the two variables. 
But, contrary to Moulton’s (1965) suggestion 

t Mar subjects underestimate objective 
Ps, both groups overestimated. 

The relationship between the two per- 
Sonality types and Ps is made more complex 
by the additional finding that biasing is not 
Constant across levels of Ps. The biasing of 
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Ps does not follow one trend suggested by 
Feather (1965), namely, that overestimation 
of Ps by Mg and underestimation of Py by 
Mar persons are greatest when achievement 
motivation is greatest, that is, at the Py = 
.50 level. Overestimation of Pg by Mg per- 
sons is greatest for the more difficult prob- 
lems which M4r persons also overestimate. 
Brody (1963) proposed what amounts to a 
contrary relationship between Ms, Mar, and 
Ps. He argued that Mg and May persons dis- 
tort their estimates of Ps to maximize their 
performance. If this is the case, M, persons 
bias task difficulty so that all tasks are per- 
ceived as close to Pg = .50 as possible. 
Accordingly, they judge Pa most accurately 
when Ps = 50 and tend to overestimate Ps 
on difficult tasks and underestimate Pg on 
easy tasks. If we can assume that May sub- 
jects also want to maximize resultant moti- 
vation as Brody suggests, they most ac- 
curately judge easy and difficult tasks and 
either underestimate or overestimate mod- 
erately difficult tasks. The present results are 
clearly inconsistent with this view. Over- 
estimation of Pa by Ms subjects is relatively 
pronounced at Py = .50 and least when Pa 
is high. For their part, May persons are least 
accurate at the extreme Pa levels, under- 
estimating Ps on easy problems and over- 
estimating Ps on difficult ones. Their ac- 
curacy is greatest at an intermediate Ps level. 

Finally, Feather (1965) proposed subjec- 
tive attractiveness of success (Ma X Is) and 
subjective repulsiveness of failure (May X 
Ir) as constructs which may explain the 
influence of Mg and May on Ps. The present 
results are most, but not entirely, consistent 
with this interpretation. The Mg X I, 
formulation implies greater attractiveness of 
success and consequently greater overestima- 
tion of Pg with decreases in Pa and the ac- 
companying increases in Is. Thus, Ma per- 
sons are most accurate on the easiest tasks 
and most optimistic (i.e, overestimate Pa) 
on difficult tasks. This view seems to predict 
the P, judgments of Mg persons, except on 
the extremely low Pg level where a slight 
deviation occurs (Figure 2). — 

Subjective repulsiveness of failure for Mary 
subjects is greatest on easy tasks when both 
P, and I, are high, and it diminishes as 
tasks become more difficult. Accordingly, 
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May persons are most accurate when the 
task is difficult and most pessimistie (i.e., 
underestimate Ps) when the task is easy. 
The present findings suggest such a relation- 
ship, but as the curves in Figure 2 indicate, 
the effect is not as strong as the hypothesis 
requires. Although May subjects under- 
estimate the Ps of easy tasks significantly 
more than Mg persons and Mg persons over- 
estimate Pg in the difficult condition (Ps = 
20-30) more than May subjects, the most 
accurate estimate of Pg by Mar subjects is 
not at the most difficult point on the Ps 
scale, as this view would predict. Neverthe- 
less, in this important and relatively unex- 
plored area of achievement motivation, the 
notions of subjective attractiveness of suc- 
cess and repulsiveness of failure seem to 
accommodate the present data best, or at 
least well enough to merit further investiga- 
tion. 

The findings of the present study suggest 
that application of achievement motivation 
theory to educational practice is hampered 
not only by the assessment of Ms, but per- 
haps even more so by problems associated 
with the estimation of Ps. The latter factor 
may possibly be subject to influences of Mg, 
of social factors, and of the developmental 
level of the learner, thereby making its 
manipulation by the teacher most difficult. 
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COURSE EVALUATION: 
WHEN?! 


NEIL A. CARRIER; GEORGE 8. HOWARD, ax» WILLIAM G. MILLER 


Southern Illinois University 


Support was found for two hypotheses which declared that students 
attending the last regular meeting of a college course in introductory 
psychology give more favorable instructor and course evaluations than 
those attending the final examination only. Data pertinent to a third 
hypothesis suggested that, relative to the last-meeting evaluations, the 
final examination context has little or no effect on ratings. 


. Student evaluation of college courses and 
instructors has certainly increased in popu- 
larity in the last decade. Although Gustad 
(1967) reported a decline in its systematic 
use, possibly due to suspicion about its 
validity, pressures from students (Werdell, 
1967) will probably see it increase during 
the seventies. The excellent review by 
Costin, Greenough, and Menges (1971) ap- 
pears to support this prediction. 

Probably the most common practice is to 
administer an evaluation form at the last 
regular meeting of a class. Administration at 
the final examination is another possibility 
but may be avoided for fear that the context 
will sour student attitudes. 

However, a problem may exist with the 
first procedure when class attendance is not 
compulsory. It is implied by this question: 
How representative a sample of the total 
class population is the group of students who 
attend the last class meeting? These faithful 
attenders may have characteristics different 
than those of the less faithful nonattenders. 
If so, the omission of the latter (one hy- 
pothesis) may give spuriously favorable 
evaluations. With this in mind a simple 
study was designed. 

The plan was to administer an evaluation 
form on the last day of class and again at 
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the final examination. Accordingly, three 
data groups would result. First would be the 
group of students who filled out the evalu- 
ation form on the last day of class—the 
last-day-of-class group. Second would be 
the same students who completed the form 
at the final examination—the final-examina- 
tion group. Third would be those students 
who completed the form at the final ex- 
amination only—the final-examination-only 
group. By proper comparisons of data, three 
intuitively based hypotheses could be tested. 

Hypothesis 1. A comparison of ratings 
from the last-day-of-class and final-exami- 
nation-only groups shows that the last-day- 
of-class group provided the more favorable 
evaluation. Implicit here is the assumption 
that the last-day attenders are a more favor- 
ably disposed sample than are the final-ex- 
amination-only attenders. 

Hypothesis 2. A comparsion of ratings 
from the final-examination and final-exami- 
nation-only groups shows that the final-ex- 
amination group provided the more favora- 
ble evaluation. In effect, this comparison 
tests the same assumption as Hypothesis 1, 
using the same two samples of subjects, but 
has the possible advantage over the Hy- 
pothesis 1 check of controlling for the con- 
text in which the ratings were performed. 

Hypothesis 3. A comparison of ratings 
from the last-day-of-class and final-exami- 
nation groups shows no systematic depres- 
sion of ratings in the final examination situ- 
ation. Behind this hypothesis is the belief 
that the situational influence of the final ex- 
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TABLE 1 
Frrst-Roor Vector or EACH or THE 19 QUESTION- 
NAIRE ITEMS FOR GROUPS FINAL EXAMINATION 
AND FINAL EXAMINATION ONLY 


Item Correlation 

1. Value of course -48 
2. Quality of lectures .51 
3. Lecturer's knowledge AT 
4. His interest .68 
5. Clarity of his explanations .51 
6. His voice quality 13 
7. His use of audiovisual aids 42 
8. Integration of course topics -31 
9. Coordination of lectures, readings, 

laboratory materials .25 
10. Quality of laboratory instructor -.12 
11. Value of laboratory sessions -.1 
12. Reading material interest .25 
13. Fairness of tests az 
14. Course difficulty Bi 
15. Intellectual stimulation from lec- 

tures 47 
16. Amount of reading .26 
17. Difficulty of reading .33 
18. Number of tests —.01 


19. Amount of studying done for course 24 


amination is not a very important con- 
tributor to ratings. 


MeEtHop 


Subjects and Final Examination 


Subjects were students in a sophomore-level 
course in introductory psychology at the largest 
campus (approximately 20,000 students) of a mid- 
western state university. The final examination 
was a one-hour, 70-item multiple-choice test which 
was approximately 80% on material since the pre- 
vious hourly and 20% on prior material. 


Evaluation Form 


The course and instructor evaluation form is 
one that was used for several years in the course. 
It consists of 19 items, the first 13 concerned with 
“qualitative aspects” and the remainder with 
“quantitative aspects.” The form provides a frame 
of reference by asking the student to compare the 
course “with other similar lower division courses 
m have hoi On the qualitative the 
choice is made from å 3-point scale bearing 
labels “better,” “about be na and nds 
The quantitative item labels are “more,” “about 
the same,” and “less,” Table 1 gives shortened 
versions of the items, 


Procedure 


The form was administered at the last regular 
lecture meeting of two large lecture groups. To 


N. CARRIER, G. HOWARD, AND W. MILLER 


permit a comparison of these students’ evalu; 
with those they would provide at the final & 
nation, a coding system unique to each studen 
explained and used. Approximately 6% 
forms were unusable as a result of spoilage org 
consistent code usage. The form was admini 
again one week later immediately prior to the 
examination—obviously for the second tim 
some (Group Final Examination) and foi 
first time for others (Group Final Examinal 
Only). Scoring was done by giving a scale yj 
of 3 to a "better" or “more” response and a 
of 1 for a "poorer" or “less” choice. 

Because of the variety of scale topics, no 
summation of scores across items could be rati 
ally done. Consequently, the hypothesized 
ences between the groups of data were inves 
by means of a multiple discriminant analy: 
technique was employed as a way of characteris 
group differences rather than as a means of cla 
fication, which is its more common use (Bock 
Haggard, 1968). 


REsurTS 


The multiple discriminant analysis (Vd 
man, 1967) was performed comparing ti 
ratings of Group Last Day of Class (mi 
221) with the ratings of Group Final E» 
ination Only (n — 174) to test Hypothesis 
Analysis over the 19 items showed 
groups to be significantly different (F = 
2.97, df = 19/375, p = .0001). (‘The centro 
of the discriminant space generated in Ul 
analysis were 2.894 for Group Last Day 
Class and 2.499 for Group Final Examine 
tion Only.) Inspection of the data revea 
that on 15 of the 19 items, Group Last Day 
of Class gave more favorable ratings than 
Group Final Examination Only, as P¥ 
dicted by Hypothesis 1. final 

A similar analysis comparing the ur 
exam ratings of Group Final Examination 
(n = 221—the same students as Group E 
Day of Class) and those of Group Final 
amination Only (n — 174) also sh 19/ 
significant differences (F = 2.98, df = < d: 
375, p — .0001). (In this case, the cen 
were 3.747 for Group Final Exam 
and 3.325 for Group Final Examine) 
Only.) Inspection showed more favors 
ratings by Group Final Examination E 
by Group Final Examination Only 0n 1 
the 19 items. Thus, Hypothesis 2 was 
supported. : 3 

In view of the great similarity n 
two comparisons, the remainder of ; 


COURSE EVALUATION: WHEN? 


sults deal only with the course evaluations 
completed on the day of the final examina- 
:tion—Groups Final Examination and Final 

` Examination Only. The trace (sum of the 
roots) of the matrix formed as the ratio of 
the among-groups to the pooled, within- 
groups variance-covariance matrices was 
found to be .1512. The first root of this 
product matrix accounted for 99.98% of the 
trace. Since the trace is a measure of vari- 
ance to be accounted for by the roots (where 
the roots satisfy the determinantal equation 
|W-!A — M| = 0), one root sufficiently 
accounted for the available variance. This 
is, of course, the expected result when two 
groups are analyzed (Cooley & Lohnes, 
1962). 

Table 1 presents the loadings of each of 
the 19 questionnaire items with the first 
root’s corresponding vector (discriminant 
function). As would be expected from the 
diverse coverage of the items, there are great 
differences in how heavily each item loads 
on the discriminant function. . 

Table 2 presents the results of the uni- 
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variate F tests for the 19 items. In these 
comparisons, the between and within de- 
grees of freedom are 1 and 393, respectively. 
Finally, to shed light on the third hy- 
pothesis, Pearson correlations of test-retest 
reliability for the subjects in the last-day- 
of-class and final-examination groups were 
run for each questionnaire item. These are 
shown in Table 3. The range of these cor- 
relations is from .76 to .44. In view of the 
restricted range (1-3) of the item scales, 
these correlations are of reasonable magni- 
tude. Also in Table 3 are the means on each 
item of Groups Last Day of Class and 
Final Examination. Inspection shows that 
although the Group Last Day of Class 
means are higher on 13 items and Group 
Final Examination means are higher on 5 
(with 1 tie), the differences are generally 
miniscule and none approaches significance. 
Support for Hypothesis 3 seems indicated. 


Discussion 


In effect, this study was a replication of 
an unpublished one performed by one of the 


TABLE 2 


RESULTS or THE UNIVARIATE F Tests FOR THE 19 QUESTIONNAIRE ITEMS FOR Groups FINAL 
ExaMiNATIÓN (FE) AND FiNAL EXAMINATION Onty (FEO) 


17. Difficulty of reading 


Group FE Group FEO 
Item ii 
x m M SD 
1. Value of course ae 25 s v epe 
2. Quality of lectures 2.69 ae Hrs js 11.51** 
3. Lecturer's knowledge 2.77 4 2 2 2 Vigo 
4. His interest 2.79 ve 25 : 1e" 
5. Clarity of his explanations 2.67 aa gee “59 ‘ol 
6. His voice quality ABS b nce PE C 
7. His use of audiovisual aids Ad 25 -58 4.87* 
* Integration of course topics " 2.51 | -52 , À y 
m inati i ‘ator, 
Coordinabigy of lectures, readings, laboratory 2.47 57 2.36 a d 
10. Quality of laboratory instructor 2.43 i HS “66 ‘61 
11. Value of laboratory sessions 2.30 s 2.20 06 3.29 
12. Reading material interest. 2.31 .6 2.15 “60 1.56 
13. Fairness of tests 2.23 a 2.57 ^53 ‘62 
14. Course difficulty 2.61 | - 2.32 | .61 | 11.50** 
i Intellectual stimulation from lectures 2m yn 2.00 4 3.48 
. Amount of reading 2.45 .5l 2.32 
2.33 2.33 
2.63 2.53 


18. Number of tests 
19. Amount of studying done for course 


BbRR 
e 
8 
: 
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TABLE 3 i 


(LD) anp Finau Examination (FE) on THe 19 QUESTIONNAIRE Ires 


Item 


1. Value of course 

2. Quality of lectures 

3. Lecturer’s knowledge 

4. His interest 

5. Clarity of his explanations 

6. His voice quality 

7. His use of audiovisual aids 

8. Integration of course topics 

9. Coordination of lectures, readings, laboratory 
materials 

10. Quality of laboratory instructor 

11. Value of laboratory sessions 

12. Reading material interest 

13. Fairness of tests 

14. Course difficulty 

15. Intellectual stimulation from lectures 

16. Amount of reading 

17. Difficulty of reading 

18. Number of tests 

19. Amount of studying done for course 


Group LD Group FE r r 
2.61 2.04 .55 0 
2.71 2.69 15 E 
2.78 2.77 05 Ex 
2.76 2.79 .80 53 
2.04 2.07 45 44 
2.61 2.58 27 .60 
2.83 2.78 1.36 52^ 
2.43 2.51 1.97 .92 
2.46 2.47 .06 45 
2.43 2.43 01 3 
2.26 2.30 40 66 
2.22 2.31 2.34 48 
2.19 2.23 29 .55 
2.56 2.61 94 66 
2.53 2.52 07 .50 
2.65 2.70 1.40 -56 
2.38 2.45 1.62 E 
2.27 2.33 1.41 E 
2.00 2.63 .26 16 


* None of the F values in this column approaches significance. 


authors a year previously. Although more 
appropriate statistical procedures were em- 
ployed in the present effort, similar findings 
are apparent. The confirmations of Hy- 
potheses 1 and 2 suggest that in a course 
with voluntary attendance the last-day at- 
tenders are more disposed to give favorable 


evaluations than the group that attends the, 


final examination only. The implications of 
this, for determining which procedure to fol- 
low in obtaining the most favorable evalu- 
ations, now present a challenge to the in- 
structor’s conscience. 

The evidence pertinent to Hypothesis 3 
supports the contention that the final-exam- 
ination context does not have an appreciable 
depression effect on ratings. The implication 
of this for the instructor who also wishes to 
get the most representative sample of 
student reactions is to give evaluation forms 
at the final examination with relative im- 
punity. 

However, a few caveats are in order, The 
comparison of Group Final Examination 
and Group Final Examination Only data 
(Hypothesis 2) was performed to control 
for the possibility that the differences be- 
tween Group Last Day of Class and Group 


| 
MEANS AND Pearson CORRELATION RELIABILITY COEFFICIENTS FOR GROUPS Last Day or Crass 


Final Examination Only data (Hypothesis 
1) could have been due to negative feeling 
induced by the testing situation. This con- 
trol may not have been completely effective: 
It might be argued that student responses 
the final-examination situation were mate 
with an effort on their part to be cons 
with the last-day-of-class responses whi Í 
they made a week before. A replication 0 
this study to remove this possible infine 
could easily be done: simply record by Rr 
coding system the students attending 
last class period but don't administer ihe 
evaluation form, then give the form e 
final examination to all present and ob 
the eoded information again. Analysis M 
then contrast the evaluations of Bu 
who had attended the last class meeting 
those of students who had not. Any pos 
consistency bias would thereby be e 
The  test-retest reliability inform? 
would be lost, however. Em" 
A second variable of possible signifi 
to Hypothesis 3 might be the deem 4 
difficulty of the final examination. A: mall 
the examination here was rather E gt 
comprehensive and only one hour 1m ful 
Would the prospect of facing a more 
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comprehensive examination of two or three 
hours duration depress ratings substantially 
more than was apparent here? 

Finally, a further refinement of research 
on evaluations would be to partial out vari- 
ables that previous research suggests may 
have potency (e.g., expected grades: Costin, 
Greenough, & Menges, 1971) before con- 
trasting the centroids of the treatment 
groups. 
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The use of instructional objectives to 
direct learners to relevant portions of a text 
has been shown to enhance objective-rele- 
vant (intentional) learning (Duchastel & 
Merrill, 1978, for a partial review; Kaplan, 
1973; Kaplan & Rothkopf, 1974; Roth- 
kopf & Kaplan, 1972). The objectives typi- 
cally used in these studies direct the learner 
to relevant material but do not repeat the 
relevant information. This is similar to 
providing the learner with unanswered ques- 
tions. In both cases, the learner then must 
search the text for the information relevant, 
to the objective or question. Frase (1968, 
1970) suggested that questions located prior 
to text segments serve as orienting stimuli 
which influence the learners’ inspection of 
the text. Frase, and later Patrick (see Frase, 
1973) and Boyd (1973), Suggested that 

questions used in this way might be the 
occasion for selective attention to relevant 
text items, namely, incidental learning 


1 We are greatly indebted to E. M. Burgin for 
her valuable assistance throughout the experiment. 

? Requests for reprints should be sent to Robert 
Kaplan, Bell Laboratories, CPI L-101, P.O. Box 
2020, New Brunswick, New Jersey 08903. 


might be depressed with prequestions under 
some conditions. 

Other studies investigating the use of ob- 
jectives or questions after text segments 
suggested that they may function as a sum- 
mary or review rather than as orienting 
stimuli (Bruning, 1968; Frase, 1967; Roth- 
kopf, 1966). These studies also showed that 
intentional learning was greater when the 
answers to questions were provided than 
when no answers were given. In general, 
overall retention tended to be greater when 
questions were located after rather than 
before text segments. The distinguishing 
feature of these studies is that the objectives 
or questions presented after text segments 
could not be used as orienting stimuli which 
permit the learner to selectively attend to 
Specific information during inspection of the 
text. Frase (1967) suggested that increased 
performance under these conditions could 
result from several factors: (a) review, (b) 
Tepetition of relevant material, and (c) 
practicing test-like events. 

e relative effects of objectives or ques- 
tions located before or after a text are less 
clear with Tespect to incidental (nonobjec- 
tive-relevant) learning. Several studies have 
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TABLE 1 
n HYPOTHESIZED STRATEGY, CONSEQUENCE, AND OUTCOME OF TREATMENTS 
Suggested consequence ee A 
Objective treatment Hypothetical strategy 2 {i 
Repetition | Selection | Search: | Taten; | fact 
No objective (reference re i 
ead text nonselect: 
din. ively no no no =b 
Before/without information | read objectives; search text partial yes yes = < 
AM OS CÓ 2 selectively for answers 
efore/with information read objectives with informa- | no yes no - « 
ARS ) tion; ignore text 
fter/with information read text nonselectively;read | yes no no > = 
objectives with informa- 
ee L : tion 
fter/without information read text nonselectively;read | partial no no - - 
objectives without informa- 
tion 


* Comparison between treatment and reference groups. 


ro H : 
Comparison between test item sets for reference group. 


reported greater incidental learning for 
groups receiving objectives or questions than 
for nonobjective control groups (Bruning, 
1968; Frase, 1967; Rothkopf, 1966; Roth- 
kopf & Bisbicos, 1967). Frase (1967) and 
Rothkopf (1966) both found greater in- 


. cidental learning when questions were pre- 


sented after rather than before text seg- 
ments. The previously mentioned Kaplan 
and Rothkopf (1974) and Rothkopf and 
Kaplan (1972) studies found varied effects 
for incidental learning when objectives were 
presented prior to text. 

Based upon the results of all the studies 
previously discussed, it would seem that at 
least three processes influence the subjects’ 
learning with objectives (repetition, selec- 
tion, and search). The primary purpose of 
the present study was to use these constructs 
to predict learning outcomes. This was done 
by determining the learning effects of ob- 
jectives presented prior to or after a text 
when the objectives were written with or 
without relevant information. An objective 
with information is analogous to à question 
containing an answer, while an objective 
without information is analogous to a ques- 
tion not containing an answer. It is possible, 
then, to hypothesize specific inspection 
strategies that subjects may adopt for each 
of the four treatments and for a reference 


group receiving no objectives (Table 1). 
The three hypothetical processes that allow 
these predictions are repetition, selection, 
and search. Each strategy is associated with 
a suggested consequence in terms of the 
processes employed and the subsequent pre- 
dicted outcomes. It is assumed that (a) 
repetition will result in increased learning 
of the repeated information and in increased 
inspection time; (b) selection will result in 
increased learning of selected information 
(because there is less information to attend 
to), decreased learning of information not 
selected (incidental learning), and in a 
reduction of inspection time; and (c) search 
will result in increased learning of informa- 
tion not selected and in increased inspection 
time, both relative to selection alone. 

The reference group is hypothesized to use 
a strategy that gives equal attention to each 
text sentence (i.e. nonselective text inspec- 
tion). The suggested consequence of this 
strategy is that the subject would not have 
the opportunity for repetition (of objective 
and then text sentence), selection (of only 
relevant material for study), or search (for 
relevant sentences within the text). The 
predicted outcome would be equal perform- 


extended to L. T. Frase for his 


* Appreciation is 
rmation in Tables 1 and 2. 


contribution to the info 
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ance on material that was used to measure 
intentional and ineidental learning for the 
treatment groups (Table 1). When objec- 
tives without relevant information are pre- 
sented prior to text (ie. available concur- 
rently with the text) and the subject is 
instructed that the test will only pertain to 
objective-relevant material, an efficient 
strategy would be for the subject to read the 
objectives and then search the text to select 
relevant information. The consequence of 
this strategy would involve partial repeti- 
tion (the objective is repeated as part of the 
text sentence), search (searching the text for 
relevant sentences), and selection (focusing 
attention on objective-relevant text sen- 
tences). Based upon these consequences, the 
treatment group’s intentional learning is 
predicted to be greater than the reference 
group’s performance on the same test items. 
The treatment groups’ lack of attention to 
nonobjective-relevant ` material and the 
reference group's equal attention to this 
material is the basis for predicting treat- 
ment group incidental performance to be 
less than that of the reference group (Table 
1). When objectives with relevant informa- 
tion are presented prior to text, the strategy 
of attending to the objectives but not to the 
, text may be most efficient, The consequence 
of this strategy would involve the process 
of selection (i.e., select objectives only for 
study) but not the process of repetition (i.e., 
no repetition of objective and then text 
sentence) or search (ie. no text search). 
Although the treatment group's use of selec- 
tion reduces the amount of material at- 
tended to, the difference was not considered 
large enough to produce significant learn- 
ing differences. Further, neither the treat- 
ment nor the reference group benefits from 
repetition or search. Therefore, the treat- 
ment group's performance on objective- 
relevant material is predicted to be about 
equal to that of the reference group. Con- 
versely, the treatment group's nonobjective- 
relevant learning is predicted to be less than 
the reference group's because of the treat- 
ment group’s lack of attention to this mate- 
rial (Table 1). When objectives with 
information are presented after text inspec- 
tion (not concurrently), the text must be 


read nonselectively (same as control group). 
Then the objectives with information are 
read as a summary/review of relevant in- 
formation. The suggested consequence of 
this strategy is the use of repetition but not 
selection or search. Due to the repetition 
advantage, the treatment group is predicted 
to learn more objective-relevant material 
than the reference group. Because both 
groups read the nonobjective-relevant mate- 
rial nonselectively, they are predicted to 
perform about equally on this material 
(Table 1). Finally, when objectives without 
information are presented after text inspec- 
tion, the text is again read nonselectively, 
and the subsequent objectives without in- 
formation provide only a partial repetition 
of relevant text sentences, one which is not 
sufficient to produce learning. Neither selec- 
tion nor search is available for this treat- 
ment. Hence, the predicted outcome is for 
equal performance on intentional and in- 
cidental items for the treatment and refer- 
ence groups (Table 1). 

Several predictions can be made for the 
amount of inspection time used for each of 
the above treatments. Objectives presented 
before text are predicted to result in less in- 
spection time than objectives presented 
after text. This prediction is made because 
the objectives-after-text treatments re- 
quire time, first, to read the text nonselec- 
tively and, then, require additional time for 
the second task of inspecting the objectives 
as a summary /review. Objectives before text 
are predicted to result in slightly greater in- 
spection time when they are presented with- 
out than with information. This prediction 
is based upon the additional search time re- 
quired when the objectives do not contain 
information. Conversely, objectives after 
text are predicted to result in slightly 
greater inspection time when they are pre- 
sented with than without information. This 
prediction was made because objectives 
serve as a better summary/review when 
they are presented with than without m- 
formation; thus, the learner is expected to 
spend more time using the objectives kir 
information. Finally, all of the treatmen 
groups are predicted to result in greater ui 
spection time than the reference group, W! 


M 
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the exception of the objectives-with-infor- 
mation-before-text group. This is predicted 
because all of the former treatments have 
the combined task of inspecting objectives 
and text. However, the latter treatment 
group inspects objectives only, thus they 
have to attend to less information than the 
reference group. 

A more specific goal of the present experi- 
ment, then, is to test these predictions. 


METHOD 


Materials 


The two experimental passages were used in 
previous studies (Kaplan, 1973; Kaplan & Roth- 
kopf, 1974; Rothkopf & Kaplan, 1972). The pas- 
sages were segments of two textbooks prepared by 
the Systems Training Department, Bell Lab- 
oratories, Piscataway, New Jersey“ The con- 
tent of the passages pertained to printer specifica- 
tions for designing forms (Passage 1) and to an in- 
troduction to business information systems (Pas- 
sage 2) and comprised 842 and 1,091 (X = 967) 
words in 60 and 54 (X = 57) sentences, respec- 
tively. É 

Two types of objectives were prepared for this 
study (without and with information). Objectives 
without information were the same as the specifi- 
cally phrased objectives used in the above men- 
tioned Kaplan and Rothkopf studies. A specifically 
phrased objective was defined as having been em- 
pirically determined to require the subject to learn 
only one passage sentence (for details, see Roth- 
kopf & Kaplan, 1972). The distinguishing feature 
of this type of objective was that it did not contain 
relevant text information; that is, the objective 
told the subject what to learn but did not reveal 
the information necessary to answer a test question 
about that objective. An example of an objective 
without information is, “Learn how many years 
75% to 100% rag paper will last.” Conversely, à 
matching set of objectives was prepared that did 
contain the relevant text information necessary to 
answer test questions about those objectives. An 
example of an objective with information that cor- 
responds to the previous example is, “Learn that 
75% to 100% rag paper will last 30 to 40 years.” 
The number of matching objectives, with and with- 
out information, for both passages, Was 36 and 33 
(X = 34.5). Thus, the proportion of objective- 
relevant, passage sentences (345) to total passage 
sentences (57) was about 60% ; that is, the sub- 
jects were instructed to learn 60% of the passage, 
which was equivalent to “density-60%” in the pre- 
vious studies. 


* Gratitude is extended to F. L. Stevenson, Head, 
Systems Training Department at Bell Laboratories 
for permitting us to use the experimental materials. 
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Both types of objectives were located either be- 
fore or after the passage. The only difference was 
in the initial instruction which stated that the sub- 
ject should learn (before) or should have learned 
(after) the list of objectives. 

The tests were also the same as those used in 
the previous studies. They consisted of short- 
answer, fill-in-the-blank items. A test question was 
written for almost every sentence in Passages 1 
and 2 (56 and 52 questions, respectively). This 
procedure permitted measuring both objective- 
relevant (intentional) learning and nonobjective- 
relevant (incidental) learning. The test questions 
for each passage were presented in three different 
random orders. 


Procedure 


All subjects received a set of sequentially num- 
bered manila envelopes containing experimental 
materials. Each envelope had preliminary instruc- 
tions printed on the outside and specific instruc- 
tions printed on pink paper inside the envelope. 
All subjects proceeded at their own pace without 
time limitations. Those subjects receiving objec- 
tives before the passage, either with or without in- 
formation, were given three envelopes consisting 
of (a) objectives and a passage, (b) a test, and (c) 
supplementary reading material to occupy subjects 
finishing early. These subjects were told that they 
would be tested only on information in the pas- 
sage relevant to the objectives. By testing them on 
additional nonobjective-relevant items, a measure 
of incidental learning was achieved. Those subjects 
receiving objectives after the passage, either with 
or without information, were given four envelopes 
consisting of (a) a passage, (b) objectives, (c) a 
test, ard (d) supplementary material. These sub- 
jects were given the general instruction in Envelope 
1 to learn everything in the passage. They were 
then given the objectives in Envelope 2 as a re- 
view of important material. The reference group 
receiving no objectives was given three envelopes 
containing (d) a passage, (b) a test, and (c) sup- 
plementary material. However, they were only 
given the traditional instruction to learn everything 
in the passage. The subjeets participated in groups 
of no less than 30 in a session. They were required 
to record their start and stop inspection times for 
each envelope. A digital clock with 2 X 5 inch 
numerals was provided for that purpose. 


Design 

Percentage correct responses were analyzed with 
a2X2Xx2 analysis of variance with (a) two levels 
of objective information (with and without), (b) 
two levels of objective location (before and after), 
and (c) two levels of learning (intentional and in- 
cidental) with repeated measures on the last factor. 
In addition, a 2 X 2 analysis of variance was per- 
formed on log transformations of inspection time 
with (a) two levels of information and (b) two 
levels of location. Inspection time data consisted 
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of the combined time spent with the passage and 
objectives. 


Subjects 


The subjects were tenth-, eleventh-, and twelfth- 
grade students from three New Jersey high schools. 
Half of the subjects were paid volunteers who par- 
ticipated after the last school class. The other half 
of the subjects partieipated during school hours 
and were aware that a contribution to the student 
activity fund was being made. Subjects from both 
schools were randomly assigned to all treatments. 
'The subjects consisted of 118 males and 182 fe- 
males (N = 300). This allowed for 60 subjects in 
each of the four treatment groups and the refer- 
ence group. 


Resvuits 
Intentional and Incidental Learning 
Two main effects of this analysis were not 


significant. Specifically, no differences in 
performance were found (a) between ob- 
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LOCATION OF OBJECTIVES 
Faure 1. Mean proportion of correct inten- 
tional (intent.) and incidental (incid.) test items 
for treatment groups receiving objectives with and 
without relevant information as a function of ob- 
jective location and for a reference (REF) group. 
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jectives located prior to (X — .42) and after 
(X = .46) the passage (F = 3.26, df = 
1/236, p > .05) or (b) between objectives 
containing information (X — .44) and not 
containing information (X = .44; F = .06, 
df = 1/236, p > .05). However, Figure 1 
shows a significant Location of Objectives x 
Information Content interaction (F = 10.04, 
df = 1/236, p < .01). 

As nnm performance on intentional 
items (X = .50) was greater than that for 
incidental items (€ = .38; F = 81.03, df = 
1/236, p < .001). A significant interaction 
was found between the location of objectives 
and learning (F = 16.06, df = 1/236, p < 
.001). This interaction was a result of in- 
creases in incidental learning when objec- 
tives were presented after (X = .43) rather 
than before (X = .33) the passage, while no 
differences occurred for intentional learn- 
ing. The Objective Information Content X 
Learning interaction was not significant 
(F = 2.62, df = 1/236, p > .05). Finally, 
the three-way Location x Information Con- 
tent x Learning interaction was significant 
(F = 13.60, dj = 1/236, p < .001). This 
interaction can be seen in Figure 1 as result- 
ing from the Location x Information inter- 
action found for intentional learning and 
the simple increase in incidental learning 
with the after-passage objective location. 

Paired comparisons between data points 
in Figure 1 were accomplished with the 
Newman-Keuls technique. For intentional 
learning, objectives without information re- 
sulted in greater performance when locati 
prior to the passage (X = .56) than after 
the passage (X = .42; q = 5.04, df = 236, 
r= 2, p < .01). Conversely, objectives with 
information resulted in greater performance 
when located after the passage (X = 57) 
than prior to the passage (X = .46; d = 
4.15, df = 236, r = 2, p < .01). In addition, 
performance on objectives without informa- 
tion was greater than objectives with in- 
formation when located prior to the passage. 
while the reverse was found when objectives 
were located after the passage (q = 359 
df = 236, r = 2, p < 05 and q = 2 3 
df = 236, r = 2, p < .01, respectively ). For 
incidental learning, performance on pec 
tives with information was greater W 
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located after the passage (X = .43) than 
before the passage (X = .31; q = 444, df = 
236, r = 2, p < .01). No difference in per- 
formance was found between objectives 
without information when located after the 
passage (X = .42) and when located before 
the passage (X = .35; q = 2.70, df = 236, 
r=2,p> 05). 

Comparisons between the reference group 
receiving no objectives and the treatment 
groups were made with one-tailed Dunnett 
tests. The error terms were derived from 
two separate analyses of variance for in- 
tentional and incidental learning items. Both 
analyses were single factors with five levels 
of groups (before/with, before/without, 
after/with, after/without, and reference). 
The intentional -learning comparisons 
showed that objectives located before the 
passage without information and after the 
passage with information resulted in greater 
performance than the reference group (t = 
3.85 and 4.21, respectively, df = 295, k = 
5, p < .01). No difference was found be- 
tween the reference group and objectives 
located before the passage with information 
or after the passage without information 
(t = 1.47 and .49, respectively, df = 295, 
k = 5, p > .05) for both tests. The inciden- 
tal learning comparisons showed that the 
reference group performance was greater 
than objectives located before the passage 
both with and without information (t = 
3.76, p < .01 and t = 2.48, p < .05, respec- 
tively, df — 295, k — 5). No differences were 
found between the reference group and ob- 
jectives located after the passage either with 
or without information (t = .10 and .13, 
respectively, df = 295, k = 5, p > .05). 


Inspection Time 


Figure 2 shows that less inspection time 
was consumed when objectives were located 
before the passage (X — 16.66) than when 
they were located after the passage (X = 
1847; F = 539, df = 1/216, p < 05). Ap- 
proximately the same amount of time was 
consumed by groups receiving objectives 
with (X = 17.42) and without (X = 17.71) 
information (F = .22, df = 1/216, p > 05). 
The interaction between these factors was 
not significant (F = 1.75, df = 1/216, p > 
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.05). Although this interaction was not sig- 
nificant, paired comparisons using the New- 
man-Keuls technique showed that for ob- 
jectives with information, more time was 
consumed when located after (X = 18.90) 
than when located prior (X = 15.93) to the 
passage (q = 3.64, df = 216,r = 2, p < 
01). Conversely, objectives without infor- 
mation required about the same inspection 
time when located before (X = 17.39) or 
after (X = 18.03) the passage (q = .96, 
df = 216,r = 2,p > .05). 

Inspection time comparisons between the 
reference group and the treatment groups 
were made with one-tailed Dunnett tests. 
The error term for this analysis was derived 
from a single-factor analysis of variance on 
the log inspection time data, with five levels 
of groups (four treatment groups and the 
reference group). The reference group (X= 
11.71) consumed less inspection time than 
(a) objectives before/with information, (b) 
objectives before/without information, (c) 
objectives after/with information, and (d) 
objectives after/without information (t — 
3.48, 4.80, 6.15, and 5.50, respectively, df — 
274, k = 5, p < .01) for all four tests. 


a 
8 


MEAN INSPECTION TIME (Minutes) 


s 
8 


BEFORE AFTER 


LOCATION OF OBJECTIVES 


Ficure 2. Mean inspection time for treatment 
groups receiving objectives with and without rele- 


vant information as a function of objective location 
and for a reference (REF) group. 
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Discussion 


The results of this experiment tend to 
confirm the predictions posited in the intro- 
duction except for the inspection times be- 
tween objectives with information located 
before text and the reference group. Table 
2 shows the predictions and subsequent 
scores obtained for the treatment and con- 
trol groups for intentional and incidental 
learning. The increased intentional learning 
found when objectives without information 
were presented prior to text over the refer- 
ence group’s performance on the same items 
was predictable from previous studies 
(Kaplan, 1973; Kaplan & Rothkopf, 1974; 
Rothkopf & Kaplan, 1972). This finding 
may be understood in terms of Frase’s 
(1968, 1970) suggestion that questions or 
objectives of this type serve as orienting 
stimuli that result in selective attention to 
relevant material. When objectives are used 
in this way, an opportunity exists for partial 
repetition, selection, and search (Table 1). 
Conversely, this treatment group's inciden- 
tal learning was predicted and found to be 
less than the reference group's performance 
on the same items (Table 2). This differ- 
ence is attributed to the lack of attention 
given to incidental material by the treat- 
ment group and the equal attention given 
to all material by the reference group. 

When objectives with information were 
presented before the text, intentional learn- 
ing was found to approximately equal the 
reference group's performance (Table 2). 
This was predictable in that the treatment 
group strategy was expected to exclude 
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repetition and search (Table 1). TI 
slightly greater treatment group mean 
formance (X = .45) over the referen 
group (X = .40) was probably due to 
reduction of material attended to by t 
treatment group as a result of selection (ie, 
attending to the objectives only). As 
pected with this strategy, the treatment 
group’s incidental learning was less than the 
reference group performance because the 
treatment group would not have attended to 
incidental text material. 

Intentional learning was greater for sub- 
jects receiving objectives with information 
after text than for the reference group. This 
was predictable in that the treatment group 
could benefit from repetition and summary/ 
review (Table 2). Incidental learning was 
found to be about equal for both groups. 
This was predicted because both groups 
read the text nonselectively. 

Performance, when objectives without in- 
formation were presented after the text, was 
found to be the same as the reference group 
for intentional and incidental learning 
(Table 2). This was predicted because the 
text was read nonselectively by both groups 
and the objectives served as an inadequate 
summary/review in that relevant informa- 
tion was excluded. 

These results suggest that the increased 
learning found with objectives resulted from 
more than one process (repetition, selection, 
and search). Further, the processes em- 
ployed seem to vary with specific learning 
strategies which are influenced by the type 
and method of presenting objectives. Thus, 


TABLE 2 
PREDICTED OUTCOME AND RESULTS OF TREATMENTS 
Intentional Incidental 
Objective treatment X Score 

Predicted Ed Predicted 

outcome" a TEN outcome’ Reference ‘Treatment 
Before/without information > .40 .55 « 43 -35 
Before/with information = .40 45 < 43 3l 
After/with information > 40 .58 - .43 Mt 
After/without information = .40 .42 = -43 2 


* Comparison between treatment and reference groups. 
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the results of several studies previously 
mentioned can be explained in terms of these 
processes. 

The inspection time analysis shows that 
less time is consumed by treatments receiv- 
ing objectives prior to text than after text 
(Figure 2). This may be due to the proce- 
dures employed in the present experiment. 
All subjects were permitted to inspect the 
experimental materials at their own rate. 
Those subjects receiving objectives before 
the text actually inspected the objectives 
and text concurrently. Conversely, subjects 
receiving objectives after the text inspected 
the text first and upon completion of that 
task they were given the objectives. These 
subjects were not instructed that the objec- 
tives would follow the text. Therefore, they 
may have decided to inspect the text for a 
longer time than the objectives-before group 
in that they would assume that this was 
their only opportunity to learn the material. 
Subsequent inspection of the objectives then 
required additional time. In addition, the 
objectives before text treatments could em- 
ploy a selective attention strategy to reduce 
the amount of material studied, thus, reduc- 
ing their inspection times. As predicted, ob- 
jectives without information located before 
the text and both treatments with objectives 
located after the text resulted in greater in- 
spection time than the reference group. This 
was due to the additional time required by 
these treatment groups to use the objectives. 
However, the one unpredicted finding was 
the inereased time taken with objectives 
located before the text with information 
over the reference group. This treatment 
was expected to use less time than the re- 
ference group because they could eliminate 
all incidental material. Perhaps the predic- 
tion would have been substantiated if fewer 
objectives were presented. That is, even if 
only the objectives were attended to, they 
represented 60% of the total number of text 
sentences. Thus, there was a difference of 
only 23 sentences between treatment and 
reference groups. In addition, the treatment 
group probably used some time to initially 
inspect the objectives and text before decid- 
ing to adopt the strategy of attending to ob- 
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jectives only. Perhaps more important, all 
treatment groups consumed more inspection 
time than the reference group (Figure 2). 
For practical purposes, this finding must be 
considered with respect to performance. 
That is, while more time is required for the 
before/without and the after/with relevant 
information treatments than the reference 
group, these two treatment groups’ perform- 
ance is greater than that of the reference 
group. If mastery of material is important, 
then the use of objectives may be desirable 
even though more inspection time is con- 
sumed. 

The results of this experiment suggest an 
obvious future investigation. The combina- 
tion of objectives without relevant informa- 
tion before the text and objectives with 
relevant information after the text may in- 
erease objective-relevant performance over 
either treatment alone. This would result 
because of the combined advantage of 
search and selection when objectives are 
presented before the text and repetition 
when located after the text. Particular at- 
tention should be given to the effects'upon 
incidental learning and inspection time 
under this treatment. 
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TRAINING FOR THEMATIC-FANTASY PLAY IN CULTURALLY 


DISADVANTAGED CHILDREN: 
PRELIMINARY RESULTS! 


ELI SALTZ’ ann JAMES JOHNSON 
Center for the Study of Cognitive Processes, Wayne State University 


Reported herein are the preliminary findings of a broader longitudinal 
study investigating the effects of fantasy play intervention on socially 
and economically disadvantaged preschoolers. Young children directed 
in the role enactment of imaginary stories were found to be signifi- 
cantly superior to control group youngsters on a number of measures of 
social and cognitive development. Fantasy play training was signifi- 
cantly related to a higher incidence of spontaneous sociodramatic 
play and to higher scores on selected subtests of standard IQ tests, 
and it facilitated performance on Borke’s Revised Interpersonal Per- 
ception Test. It also facilitated performance on tasks designed to mea- 
sure story-sequence memory skills and story verbalization skills. On 
the other hand, fantasy play did not significantly enhance ability to 
recall pictures as opposed to objects. In conclusion, it was noted that, 
fantasy play training is a promising and practical intervention method 
enjoyed greatly by both the children and the adult interventionists. 
Further use and study of this technique is encouraged. 


! Recently there has been a resurgence of 
interest in evaluative intervention research 
on the use of symbolic or imaginative play 
as a vehicle to foster the cognitive growth of 
preschoolers. 

Various writers who have attempted to 
define symbolic or imaginative play, includ- 
ing Feitelson and Ross (1973), Gilmore 
(1966), Klinger (1969), Millar (1968), 
Piaget (1962), Singer (1973), and Smilan- 
Sky (1968), have generally distinguished 
this form of play from exercise play, 
mastery play, and games with rules. Al- 
though symbolic play can take various 
forms, it has in common a make-believe or 
pretend quality; that is, the child uses his 
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imagination to amplify and modify the 
concrete and immediately present stimulus, 
for example, the child pretends to wake up 
a sleeping blanket. Major developmental 
theorists, for example, Piaget (1962), Vy- 
gotski (1907), and Werner (1948), have 
viewed this type of play as an indispensable 
step in cognitive development through 
which the child becomes liberated from 
stimulus boundedness and thereby advances 
toward operational and abstract levels of 
thought. 

There have been a number of training 
studies that have attempted to increase 
different types of imaginative play be- 
havior in young children in general and 
especially in preschoolers from deprived 
social backgrounds (e.g. Feitelson & Ross, 
1973; Freyberg, 1973; Marshall & Hahn, 
1967; Smilansky, 1968). The rationale for 
these studies has been based on the assump- 
tions that imaginative play is (a) trainable, 
(b) important for the development of other 
intellectual and social abilities, and (c) 
often undeveloped in young children. In 
regard to this third point, Smilansky (1968) 
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has suggested that sociodramatic play is 
directly taught and encouraged in most 
middle-class homes but is virtually absent 
in most socially disadvantaged homes. 

Smilansky (1968) showed that socially 
disadvantaged Israeli children could be 
trained to increase their sociodramatie play. 
Feitelson and Ross (1973) and Freyberg 
(1973) also demonstrated that children from 
lower social and economic levels could be 
trained to improve their imaginative play. 
Weikart, Rogers, Adcock, and McClelland 

(1970), in their influential book, advocated 
the use of sociodramatie training in pro- 
grams designed to improve the cognitive 
functioning of socially disadvantaged pre- 
schoolers. 

However, despite this increase in interest 
in various forms of dramatic play activities, 
there has been almost no systematic attempt 
to evaluate the contention that any or all 
of these types of activities can have a 
benefical effect on cognitive functioning. 

A type of dramatic play somewhat akin 
to Smilansky’s sociodramatie play was in- 
troduced and evaluated in the intervention 
study reported here. This form of play can 
perhaps best be described as thematic-fan- 
tasy play (TFP). This type of play is simi- 
lar to sociodramatie play in that it in- 
volves verbal role enactment in a group. 
However, in TFP, children dramatize tradi- 
tional folk tales popular with children, for 
example, The Three Billy Goats Gruff, 
Little Red Riding Hood, etc. Unlike socio- 
dramatic play, then, TFP. employs a struc- 
tured play theme or story plot. It is hypoth- 
esized that by providing children with 
opportunities to enact story sequences, they 
will be helped to see that events are inter- 
related and ordered in time and space, 
Theoretically, this should promote the 
development of conceptual schemata or the 
integration of experiences in preschool chil- 
dren. 

Thematic-fantasy play is distinguished 
from sociodramatic play in yet another 
way; that is, TFP involves real fantasy. In 
TFP, children are required to imagine and 
perform behaviors described to them in 
story narration which are never actually 
observed in real life. Thus, TFP demands 
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more than what is usually meant by the 
imitative behavior so central in sociodra- 
matic play; it demands imagined behavior, 
that is, a transition from symbolic story 
form into behavior form. Theoretically, this 
quality of TFP should contribute to later 
ability to use symbols and think creatively, 
An intervention program to teach TFP 
to disadvantaged children was introduced at 
Franklin Preschool, Detroit, Michigan, dur- 
ing the winter and spring of 1972. We 
wished first of all to check the feasibility of 
training disadvantaged preschoolers in this 
type of play behavior. Once it became 
obvious that the children were receptive 
to this kind of intervention, our goals 
broadened to include an evaluation of the / 
effects of TFP on various measures of social 
and cognitive development. Two experimen- 
tal groups and two control groups were 
established and training proceeded over 4 
four-month period. Utilizing a 2 x 2 fac- 
torial research design with TFP training 
and dimensionality (D) training as fae- 
tors, four intervention curricula were gen- 
erated: (a) one group of children received 
TFP training only; (b) one group received 
D training only, for example, learning to 
label and categorize stimuli along various 
dimensions; (c) one group received bo 
TFP and D training; and (d) one group 
received neither TFP nor D training, 1n- 
stead this group engaged in story listening 
and other activities unrelated to either TEP 
or D training. During the intervention traim- 
ing, records were kept of the childrens | 
behavior both while performing in the re- 
search groups and while engaging in spon- 
taneous free play in the nursery school 
classroom. Following intervention training) - 
the four research groups were tested on - 
several selected standard instruments | 
especially designed tasks in order to ass 
the effects of training. | 
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METHOD 


Subjects 


Subjects were preschoolers in Franklin 
mentary School, Detroit, Michigan. ea in- 
rooms of approximately 20 children eac à e from 
volved in the project. The subjects ue ds and 
primarily lower-economic-class backgrou 
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$ included approximately 30% southern white, 25% 


black, 25% northern white, and 20% Chicano. 
Many of the children were from families who re- 
cently moved into the city from the south. During 
the project, 24 subjects of the original sample 
pool of 80 subjects left the preschool and 19 sub- 
jects enrolled in their places. Replacements were 
approximately evenly distributed across the four 
research groups (5, 4, 5, and 5 subjects in the 
thematie-fantasy play group, dimension training 
group, mixed group, and control group, respec- 
tively) and received about the same amount of 
training (approximately 10 sessions per replace- 
ment subject). In the remaining sample of 75 sub- 
jects, there were 44 males and 31 females ranging 
in age from 2 years 10 months to 5 years 6 months 
with a median age of 3 years 8 months. 

In each of the four classrooms, subjects were 
divided into four research groups of 5 children 
each; the groups did not differ significantly on age 
and pretest scores. The four matched research 
groups were then designated by chance for the- 
matic-fantasy play (TFP) training; dimensionality 
training (D); mixed TFP and D training; and 
control (C) group activities. Each group met as 
regularly as possible three times a week for 15- 
minute sessions over a four-month period. All 
groups had approximately equal exposure to the 
same three-member team of intervention teachers, 
any two of which were normally present during a 
group session. 


Training Conditions 


A 2 X 2 factorial research design was used in 
this study. One factor was thematic-fantasy play, 
the other factor was dimensionality training. This 
produced the following four groups: (a) TFP + 
D (n = 19); (b) TFP (n = 19); (c) D (n = 19); 
and (d) C (n = 18). 

Fantasy (TFP groups). Fantasy subjects were 
exposed to a TFP curriculum which consisted of 
systematic training in role enactment of action- 
type fairy tales, The Three Pigs, Hansel and Gretel, 
etc, Fantasy subjects heard a story read to them, 
were assigned parts, and enacted the story with 
intervention teachers prompting, narrating, and 
at times taking roles in the story themselves. Few 
props were used other than chairs and tables which 
represented such things as houses, trees, bridges, 
ete., depending on the story. The children would 
dramatize the same story several times over 
successive group sessions and would take turns 
playing the various characters in the story. Fol- 
lowing role enactment, children would discuss the 
story plot during which times emphasis was placed 
on remembering the story sequence and verbaliz- 
ing the “reasons” for the events that occurred in 
the story, for example, “Why did the billy goats 
cross over the bridge?”, “Why did the baby bear 
start to ery?”, ete. 

_ Dimensionality (D) training groups. These sub- 
Jects received systematic training in labeling and 
classifying activities. Subjects were taught to iden- 
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tify, describe, and classify social and physical 
stimuli along several dimensions. They were en- 
couraged to verbalize about objects and to discuss 
the ways in which things go together in an inter- 
actional setting. 

In some training sessions, the children learned 
to recognize various forms and changes of physical 
stimuli. For example, during one session, the pre- 
schoolers explored different kinds of grapes and 
the products of grapes, for example, red, green, 
and purple grapes, raisins, and grape drinks. In 
other training sessions, the children learned to 
recognize various forms and changes of social 
stimuli. For instance, one meeting involved dis- 
cussing sex- and age-appropriate clothing and ob- 
jects. The children matched clothing and objects 
to cut-out cardboard representations of grand- 
father, father, mother, little boy, little girl, and 
baby. In short, in D training, the children were 
given repeated opportunities to discuss social and 
physical objects in a group setting. 

Fantasy plus dimensionality (TFP + D) groups. 
These groups received TFP training on 50% of 
the sessions and D training on 50% of the sessions. 

Control (C) groups. The control groups were 
primarily engaged in listening to the stories used 
for role enactment by the TFP groups without 
dramatizing these stories themselves. These sub- 
jects also participated in other types of activities 
unrelated to TFP such as playing with cut-outs, 
cut-and-paste activities, singing, and so forth. 


Evaluation Procedure 


Pretests, Before the start of the intervention 
training, the children were administered (a) the 
Picture Completion subtest of the Wechsler Pre- 
school and Primary Scale of Intelligence (WPPSI) 
and the Visual Reception and Visual Association 
subtests of the Illinois Test of Psycholinguistic 
Abilities (ITPA) as rough indicators of nonverbal 
intelligence and (b) the Similarities subtest of 
the WPPSI as a rough indicator of verbal in- 
telligence. 

Postmeasures. Evaluation of the effects of the 
intervention training utilized play observations 
and both standardized and specially designed tests. 
The following is a description of the assessment 
methods used to evaluate the effects of the inter- 
vention program. r 

1. Play observations. On 20 different days dur- 
ing the intervention training, the four classrooms 
were each observed for about 20 minutes. The ob- 
server on these occasions watched for the presence 
of either sociodramatic or thematie-fantasy play. 
When such play behavior was spotted, the observer 
recorded the names of the children involved. A 
comparison between the proportion of fantasy sub- 
jects and control subjects who were observed at 
least once engaging in such play during the first 10 
observations and during the second 10 observa- 
tions and over all 20 observations was made using 
chi-square analysis. In addition, the change in the 
number of preschoolers observed participating in 
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dramatic free play from the first set of 10 observa- 
tions to the second set of 10 observations was ex- 
amined for both fantasy subjects and control sub- 
jects separately. 5290. 

2. Intelligence subtest postmeasures. The Simi- 
larities and the Pieture Completion subtests from 
the WPPSI and the Visual Reception and Visual 
Association subtests from the original ITPA were 
chosen for use in this study. Raw score totals were 
computed for each subject and this unit was used 
in analysis with pretest score as a covariate, A ran- 
domly selected subset of 34 subjects was admin- 
istered the intelligence subtests. 

3. Interpersonal Perception Test (IPT). The 
IPT which was designed and described by Borke 

(1971) and then later revised (Borke, 1973) is a test 
for empathy in young children. This test requires 
the child to choose the “face” depicting the appro- 
priate affect another child would feel under certain 
prescribed conditions related to the child in a story 
form. The four possible selections the child has are 
“happy,” “sad,” “afraid,” and “mad” faces from 
which the child selects the most appropriate face. 
On each of the 23 items of the revised version, we 
added a “neutral” face possibility and administered 
this modified version of Borke's IPT to a randomly 
selected subset of 47 subjects of the present study. 
Total right in the 23-item test was the score used 
in statistical analysis. 

4. Picture versus object memory. Each pair con- 
sisted either of two toys or of two pictures of toys. 
The task involved five paired associates. This task, 
desgned to measure preschoolers’ ability to repre- 
sent concrete materials versus pictorial representa- 
tions in memory, was administered to 72 of the 75 
subjects involved in the program. Half of the sub- 
jects in each of the four research groups, individ- 
ually tested, received one of two sets of test ma- 
terials. In Set A, three of the five pairs were object 
pairs and two were picture pairs; in Set B, three 
were picture pairs and two were object pairs. The 
picture pairs in Set A were the object pairs in Set 
B and vice versa. All subjects then received both 
object and picture pairs. Number of errors (ie, 
failure to respond or incorrect response) for pic- 
ture pairs and for object pairs over three trials was 
computed for each subject. Preliminary analysis 
showed that there were no significant differences on 
performance on Set A versus Set B, and these two 
sets were combined within each subject group for 
additional analysis. 

5. Story memory task. This specially constructed 
test was used to assess preschool childrens’ ability 
to remember a story sequence. Subjects, individ- 
ually tested, were first shown a series of six pic- 
tures and were told a story that the pictures de- 
picted. The pictures were then shuffled and the sub- 

ject was instructed to put the pictures back into 
the original order. The subject’s score was the de- 
gree to which his order corresponded to the correct 
sequence. This score was computed as the tau co- 
efficient translated into Z scores on which statitis- 
tical analysis was performed. Forty-four randomly 
selected subjects were evaluated on this task. 
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6. Story telling task. This also was a specially 
designed task which was used to evaluate p 
schoolers’ story-telling skill. The subjects, indivi 
ually tested, were asked to tell three stories from 
three different series of five pictures each. After 
each narration, subjects were asked two questio 
pertaining to the story. These questions were i 
tended to clarify the preschooler's understanding 
for the reasons behind the actions depicted in the 
story pictures. The subjects’ performance on this 
task was evaluated in several ways: Computations 
of total verbal output on the three stories, total 
number of connectives used, and number of infer- 
ences expressed, either spontaneously or in response 
to questions, were made for each subject. In addi- 
tion, on Stories 1 and 2, each subject was scored 
for expressed continuity between pictures on a scale 
from 0 to 4. Between each consecutive pair of pic- 
tures in a story, the subject was credited with 1 
point if he stated continuity and 0 if he failed to 
express continuity. A randomly selected subset of 
49 subjects was administered the story-telling task. 


RESULTS 


The results of the present intervention 
study are organized in seven parts: (a) 
observations of each of the groups behavior 
during TFP training; (b) observations of 
spontaneous dramatic play; (c) intelligence 
tests scores; (d) Interpersonal Perception 
Test (IPT) scores; (e) picture versus object . 
memory scores; (f) story memory scores; 
and (g) story-telling scores. 


Group Observations 


Since TFP is a relatively unevaluated ap- 
proach to intervention training, perhaps a 
few comments about TFP itself are in order. 

It was observed that at first most "n 
schoolers found TFP enjoyable but difficult 
It was necessary to simplify the stories a 
participate with the children in the drama! 
zation. It was also found helpful for one ‘a 
the intervention saree to E : 
story while the preschoolers role- 
and ! to provide the children with lines | 
had forgotten. Even with extensive hs 4i 
ing and prompting, it was observ val 
most of the role enactment was nonver i 
since, at first, the children proved more quil 
ing to act than speak. Moreover, the 
dren showed little appreciation for lel y 
sequence. Often the group remember a 
the final scene or the most exciting "e ast 
the story. The children had great ent! y vi 
for action but little idea of why the 
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was taking place. In short, during the early 
stages of TFP intervention, it was apparent 
that the children had much room for im- 
provement. 

With practice, both the children and the 
intervention teachers became more skill- 
ful in TFP. It was discovered that the chil- 
dren functioned better with a minimum of 
props; it appeared that the use of realistic 
props at times distracted the children. Con- 
sequently, fewer props were employed in 
TFP. Also, we found that the children 
seemed to depend on locations in the story 
remaining constant. It was helpful to iden- 
tify places in the playing room for the chil- 
dren, Knowing locations enabled the pre- 
schoolers to orient themselves for role 
enactment. They seemed to depend on places 
remaining the same from session to session, 
although they did not seem to mind switch- 
ing roles. They even played opposite-sex 
roles with enthusiasm. 

Sometimes the children reenacted events 
not in the way they happened in the story 
but as they would have liked them to 
happen. Amusing instances of this behavior 
occurred periodically. For example, on one 
occasion a little girl performed her witch’s 
role almost perfectly until the time came to 
be pushed in the oven. Suddenly the little 
actress announced that she was a “good” 
witch and invited Hansel and Gretel’s 
mother over for coffee and cake! This 
tendency to assimilate the story, although 
benevolently accepted when it occurred, was 
something that was discouraged over the 
course of TFP training. With practice, the 
children became more adept at following 
the sequence of a story and more efficient in 
TFP in general. 


Play Observations 


The results of the play observations over 
20 sessions indicate that thematic-fantasy 
play has a significant and positive effect on 
the preschool child’s likelihood of being 
observed participating in dramatic free play 
in nursery school. Significantly more fan- 
tasy subjects than control subjects were 
observed at least once during 20 observa- 
tions engaging in dramatic free play. While 
94.7% or 38 of the fantasy subjects were 
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observed in dramatic free play, only 60.5% 
or 26 of the control subjects were observed. 
Thus, 5.3% or 2 of the fantasy subjects and 
39.5% or 17 of the control subjects were 
never observed in dramatic free play (x? = 
14.001, df = 1, p = .001). Training in role 
enactment of fairy tale stories apparently 
enhances the probability that preschoolers 
will engage in dramatic social free play in 
school, 

Examination of the changes in frequency 
of spontaneous TFP over the year for the 
fantasy subjects and the control subjects 
indicates that the differences observed in 
Table 2 are a consequence of the interven- 
tion conditions. For example, over the first 
10 play observations, 65% of the fantasy 
subjects were observed in spontaneous TFP. 
During the second 10 observations, 92% of 
the fantasy subjects were observed in such 
activities, Of the 40 subjects present during 
all 20 observations, 12 subjects participated 
in TFP during the second 10 observations 
who had not engaged in TFP during the 
first 10 observations. A chi-square test for 
change in frequencies of occurrence (Mc- 
Nemar, 1959) indicated that this increase 
was significant (x? = 7.69, p < .01). 

For the control subjects, on the other 
hand, the corresponding percentages of sub- 
jects observed in spontaneous TFP during 
the first and second 10 observations were 
45% and 40%, respectively ; in short, the con- 
trol subjects showed a small but nonsignifi- 
cant drop in spontaneous TFP. 

It may be added that teacher reports 
substantiate this finding. On several occa- 
sions, teachers brought it to the attention 
of the experimenters that certain children 
were engaging in sustained dramatic play 
behavior not only during free play but at 
other times during the school day. For 
example, on one occasion an entire snack 
table, without adult prompting, dramatized 
a birthday party involving imaginary ani- 
mals. Inspection showed that 6 of the 8 
children so playing were fantasy subjects. 


Intelligence Test Scores 


The Picture Completion and Similarities 
subtests of the WPPSI and the Visual 
Reception and Visual Association subtests 
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of the ITPA were administered in order to 
obtain a rough indication of the effects of 
training on intellectual performance. The 
sum of the raw scores on these four subtests 
constituted each child's score. Since testing 
time was limited, only 34 subjects were 
evaluated, fewer than on any other mea- 
sure. To ensure comparability, postmeasures 
were adjusted for pretest intelligence scores 
in this analysis. (It might be noted that this 
was not necessary for the other postmea- 
sures since the four groups had been 
matched on pretest intelligence scores.) The 
mean intelligence scores adjusted for pretest 
scores for the two fantasy conditions were 
40.8 and 34.0, respectively, for the TFP chil- 
dren who received dimensions training and 
those that did not receive such training. For 
the two conditions that were not involved 
in the fantasy program, the corresponding 
means were 37.4 and 28.8, respectively. 
Analysis of covariance (pretest intelligence 
of child as covariate) showed that the main 
effects for TFP and D were significant (F = 
7.35, df = 1/29, p < .011 and F = 11.8, df = 
1/29, p < .002, respectively). The interac- 
tion was not significant. Both TFP and D 
training, in other words, appear to have 
facilitated intellectual functioning. This 
finding, however, must be interpreted cau- 
tiously since a complete intelligence test 
battery was not administered and relatively 
few subjects were tested. 


Interpersonal Perception Test (IPT) 


The IPT is designed to measure the abil- 
ity of young children to cognitively repre- 
sent another's affective experience. The 
present study suggests that TFP training 
significantly increases the ability of pre- 
schoolers to respond correctly on the IPT. 
The mean scores for fantasy subjects and 
control subjects were 13.48 and 10.83, re- 
spectively (F = 6.319, df = 1/47, p < .05). 
Thus, learning to role-enact characters 
from children’s folk tales apparently in- 
creases the ability of preschool children to 
understand and identify the affective states 
of other children. Evidently; role enactment 
training can facilitate role-taking ability. 

In contrast to the effects of fantasy train- 
ing, dimensionality training appears to be 
unrelated to performance on the IPT. 
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Neither the main effects of dimensionali 
training nor the Dimensionality x Fan 
interaction approached significance. 


Picture versus object memory 


The results of the present study replica 
previous findings that memory for obje 
is superior to memory for pictures of these 
same objects. The relevant F of 76.55 (df = 
1/72, p < .001) was based on a within- 
subjects analysis. On the other hand, none of 
the main effects or interactions involving 
either fantasy training or dimensionality 
training approached significance. Appar- 
ently, TFP does not improve representa- 
tional function for this type of rote memory. 


Memory for stories 


The ability of preschool children to re- 
member a story sequence was assessed using 
a specially designed story-memory task; 
the children were required to arrange pic- 
tures so as to match an order shown to them 
earlier. (Note that the initial presentation 
was accompanied by narrative which ex- 
plained the sequence of events in the pic- 
tures.) The sequences produced by the chil- 
dren were correlated with the original 
experimental sequence using the tau co- 
efficient. The taus were converted to Z Scores 
so that each child's tau could be considered - 
a score and a mean tau could be compu 
for each group (recall that untransfo 
distributions of taus tend to be extremely 
skewed); next, to facilitate computation, 
1.0 was added to each tau (to convert 
scores to positive numbers) and each score 
was multiplied by 100. d 

The mean score for fantasy subjects wo 
26.2; for control subjects, 14.9. Analysis 0 
variance indicated that the effects of em 
tasy training were significant (F = 49 E 
df = 1/44, p < .05). Neither dinde 
training nor the Fantasy x Dimension 
teraction approached significance. 4 

It nld be noted that while the differ- 
ence between the TFP and control ae 
proved significant on this task, all the P 
schoolers found the task extremely dithe 


Story-telling task 
Young children are known to 
events of their lives as a series 0 


view the 


, "stil 
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a pictures, with little sense of causal relation- 
ship between the events. The primary pur- 
pose of the present task was to determine if 
TFP helped children see the relationship 
between events. A secondary purpose was to 
determine if TFP affected some of the gross 
indices of verbal ability. The task consisted 
of asking the children to tell the simple 
stories illustrated in a sequence of cartoon 
pictures. 

The first measure of ability to see causal 
relations was obtained by scoring Stories 1 
and 2 for consistency of theme from picture 
to picture; for example, did the story to 
Card 2 follow from the story to Card 1 in a 
sequence of pictures. The children in the 
TFP groups performed better than the chil- 
dren in the control groups on both stories. 
On Story 1, the TFP children obtained a 
mean score of 1.79, which indicates con- 
tinuity on about 45% of their transitions 
from one picture to the next; the control 
children obtained a mean of .89, which indi- 
cates continuity on 22% of their transitions. 
In Story 2, the respective means were 2.57 
(66% of the transitions) and 2.38 (59% of 
the transitions). The difference for Story 1 
was significant (F = 5.14, df = 1/49, p < 
.028); the difference for Story 2 did not 
approach significance. None of the other 
effects (viz., main effects for dimensions nor 
the Dimensions x Fantasy interaction) ap- 
proached significance for either story. 

The use of inference was a much more 
stringent measure and required that a child 
overtly connect an event in one picture to 
some other event in a different picture in 
such a way that it was clear that the two 
pietures were part of the same story. The 
Scores on this measure proved to be ex- 
tremely skewed, requiring analysis by 
means of a nonparametric technique. Anal- 
ysis of the stories showed that 95.5% of the 
22 fantasy children made at least one such 
inferential statement in the two stories 
combined; only 63.7% of the 22 control chil- 
dren used such statements. This difference 
Yields a chi-square of 5.028 (p < .05) cor- 
rected for continuity. 

_ The use of connectives was also seen as 
indicating an attempt to integrate the 
events of a story, and so the frequency of 
connectives was analyzed. These scores 
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also proved to be extremely skewed, neces- 
sitating the use of a nonparametric test of 
significance. With 22 children per group, 
100% of the fantasy group used connectives 
at least once while only 67.7% of the con- 
trols used them. The resulting chi-square 
was corrected for continuity (x? = 7.486, 
p<.0l). 

Finally, turning to total verbal output, 
the mean number of words used in story 
telling by the TFP children and the control 
children were 71.7 and 49.6, respectively, 
over the two stories, This difference was 
highly significant (F = 7.925, df = 1/40, 
p < 01). The effect was approximately the 
same for each of the two stories, with no 
interaction between stories and conditions. 

In conclusion, the data for the story-tell- 
ing task suggest that the fantasy children 
made more of an attempt to connect and 
integrate the events in telling a story. In 
support of this conclusion it should be 
mentioned that after each child told his 
story, he was asked questions about it. It 
was found that the children in the fantasy 
group appeared to possess greater compre- 
hension than the control subjects for the 
reasons behind the actions depicted in the 
series of pictures. 


SUMMARY AND CONCLUSION 


This study was conducted primarily to 
utilize and evaluate thematic-fantasy play 
and dimensions training as an intervention 
technique for socially disadvantaged pre- 
schoolers. Our preliminary observations and 
test results indicate that thematic-fantasy 
play distinguishes itself as an enjoyably 
workable and promising intervention tool 
which significantly affects a number of 
positive behaviors and abilities in pre- 
school children. Thematic-fantasy play, un- 
like dimensionality training, was found to 
be significantly associated with a higher 
incidence of spontaneous social-dramatic 
play, superior performance on Borke’s 
(1973) Interpersonal Perception Test, and 
better story-memory and story-telling skill 
on specially constructed tasks. 

The effects of both dimensionality train- 
ing and fantasy play training on intelligence 
were more equivocal. Although a relatively 
large difference in obtained scores occurred 
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between fantasy, dimension, and control 
groups and although these effects were sta- 
tistically significant, it should be noted that 
sample size was small and that only selected 
subtests were employed rather than a full 
scale test. More work is needed on this issue. 
Finally, there was no indication that fan- 
tasy training or dimensionality training in- 
fluenced ability to recall pictures as opposed 
to objects. If the inferior memory for pic- 
tures, as opposed to objects, is due to poorly 
developed ability for mental representation, 
the type of representation required appears 
to be different from that involved in either 
fantasy play or dimensionality training. 
The present report summarized the find- 
ings from the first year of a long-term re- 
search project. While our conclusions must 
be tentative at this time, the technique of 
fantasy training appears to be more promis- 
ing than the techniques of dimensionality 
training. Any benefits specific to dimension- 
ality training are not indicated by the out- 
come measures of interest in this study. 
Preschool intervention programs in the 
United States and abroad have utilized a 
rather broad range of techniques and meth- 
ods intended to compensate for cognitive 
and socioemotional deficits incurred by con- 
ditions of poverty. One of the major assets 
of thematic-fantasy play as an intervention 
technique is its appeal to preschool-age chil- 
dren. Our observations indicated that al- 
most all of the children found listening to, 
discussing, and then dramatizing “action- 
filled” fairy tales very rewarding. The chil- 
dren appeared to regard thematic-fantasy 
activities as fun not work. The children's 
enthusiasm was shared by the intervention 
teachers, who enjoyed playing with the chil- 
dren, with the net effect that thematic-fan- 
tasy play proved to be a very encouraging, 
workable, and promising intervention tech. 
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nique, one that deserves further use a 
study. 
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LEARNING BY TELEVISED “PLAZA SESAMO" IN MEXICO' 


ROGELIO DIAZ-GUERRERO? 


Centro de Investigaciones Psicopedagogicas, 
A. C. Mexico City, Mexico 


WAYNE H. HOLTZMAN 


Hogg Foundation for Mental Health, 
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A controlled experimental study was made of the effects of “Plaza 
Sesamo,” a Spanish version of “Sesame Street,” during its first telecast 
season in Mexico City. A total of 221 children from three different 
lowest-class day-care centers, equally divided by ages 3, 4, and 5, and 
by sex, were randomly assigned to experimental and control groups. 
Complete data were later obtained for 173 of these children. A battery 
of nine tests was individually administered pre-, during, and posttele- 
cast. Measures of attention to the program and of attendance were also 
taken. Highly significant differences were found for specific achieve- 
ment tests dealing with general knowledge, numbers, letters, and words 
as taught by “Plaza Sesamo.” Significant differences were also found for 
five cognitive tests only indirectly related to “Plaza Sesamo” as well 
as for Oral Comprehension, a test completely independent of the pro- 
gram content. The largest differences occurred in 4-year-olds; the 
smallest in 3-year olds. The rate of learning was consistently faster for 
the experimental groups than the controls across the three testing pe- 
riods, Amount of attention correlated as high as 49 with gains as mea- 
sured by several tests. Low but significant correlations were found be- 


tween attendance and amount of gain. 


The public acceptance of “Sesame Street” 


for preschool education by television in 1970 
marked the beginning of a new era. Close 
collaboration of educators, psychologists, 
sociologists, psychiatrists, actors, artists, 
producers, and other specialists resulted in 
large-scale efforts to design, produce, evalu- 
ate, and broadcast children's programs 
aimed at preparing millions of children to 
cope more adequately with early school 
learning. The initial evaluations of “Sesame 
Street” by Ball and Bogatz (1970; Bogatz 
& Ball, 1971) clearly demonstrated signifi- 
cant gains associated with viewing “Sesame 
Street,” although the precise meaning of 


*This research was supported in part by the 
Ford Foundation Grant 730-0206. We wish to 
acknowledge the continued cooperation of Lic. 
Juan M. Mendoza Chavez and Arq. Guillermo 
Rossell from the Secretaria de Salubridad y 
Asistencia, without whom this study could not 
have been made. We are also deeply grateful to 
Raul Bianchi and Rosario Ahumada de Diaz for 
their assistance in implementing the research de- 
sign, and to Donald Witzke for his assistance in 
analysis of the data. 

* Requests for reprints should be sent to R. 
Diaz-Guerrero, Georgia 123, Mexico 18, D. F. 


these gains has been the subject of contro- 
versy (Cook, 1972). Some criticisms of the 
initial evaluations of “Sesame Street” arise 
from the weaknesses inherent in any large- 
scale field study where certain variables 
cannot be rigorously controlled. 

The original “Sesame Street” program 
has been translated into many languages 
with little alteration of the visual images M 
the film itself. The development of "Plaza 
Sesamo” as a completely new production 
adapted to Mexican culture has presented & 
unique opportunity for evaluative researe 
using a true experimental design, thereby 
overcoming some of the difficulties in the 
early studies of “Sesame Street.” These 
formative evaluations serve also as a Pre 
lude to more extensive summative evalua- 
tions to be undertaken on a large-scale basti 
in the near future. i 

The present experiment, using preschoo 
children in day-care centers, was designet 
by the Centro de Investigaciones Psicopedir 
gogicas, Asociacion Civil (CIPAC), Mex 
City, Mexico. Experimental and e 
groups were established in day-care cen, d 
by random selection of children. In addition 
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the experiment provided an opportunity to 

try out new content-oriented tests designed 
for “Plaza Sesamo” and independent tests of 

cognitive ability which were adapted for 

w with Spanish-speaking preschool chil- 
ren. 


METHOD 


Subjects 


The directors of eight day-care centers under 
the auspices of the Secretaria de Salubridad y 
Asistencia were approached for cooperation in the 
experiment. Of the four centers expressing a wish 
lo cooperate, three had a sufficient number of 
children of the right socioeconomic status and pro- 
portionality of ages and sexes to fulfill the re- 
quirements of the research design. Emphasis was 
placed upon children from families in the lowest 
social class. It was especially important to estab- 
lish the value of children’s educational television 
programs for this largely illiterate but highly im- 
portant segment of the population. 

The mothers of all children in the three day- 
care centers within the age range of from 3 to 5 
were interviewed by research assistants using a short 
demographic questionnaire. Nearly all families 
fitted the definition of lowest social class; that is 
to say, the father (or head of the family) had not 
completed primary school and was employed as an 
unskilled worker. At pretest, there were 221 chil- 
dren almost equally divided by age group and sex. 


Experimental Design 


One-half of the children, by age group and sex, 
Were randomly selected to constitute the experi- 
mental group while the other half was designated 
48 a control group. Children in the experimental 
group were placed in rooms of each day-care center 
where they viewed “Plaza Sesamo” programs for 
50-minute periods, 5 days a week. The entire series 
of 130 programs took 6 months to complete. While 
the children in the experimental group were watch- 
ing “Plaza Sesamo,” the children in the control 
Eroup were viewing cartoons and other noneduca- 
lional TV programs in a separate room. 

The impact of “Plaza Sesamo" upon the children 
Who viewed it was evaluated by a series of indi- 
vidually administered tests given uniformly to 
"hildren in both the experimental and control 
groups at three points in time: (a) immediately 
Prior to the exposure to “Plaza Sesamo” or the 
Control films; (b) seven weeks later; and (c) six 
months later at the end of the experiment. An in- 
terval of seven weeks between the first two testing 
Sessions was chosen since the earlier studies of 

Sesame Street” used this size interval. Any initial 
effects should be apparent within this time. In ad- 
ition, daily ratings were made by trained ob- 
Servers using a 4-point scale for measuring the de- 
Bree of attention exhibited by the children in the 
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TABLE 1 
DISTRIBUTING OF CHILDREN COMPLETING 
THE EXPERIMENT ` 


Day-care center 
Age/group/sex 
Aldia fest Tizapan| Total 
3-year-olds 
Experimental 
male 5 3 12 
female 3 5 4 12 
Control 
male 6 5 5 16 
female 6 4 2 12 
4-year-olds 
Experimental 
male 6 6 2 M 
female 7 3 3 13 
Control 
male 5 6 4 15 
female 7 4 3 14 
5-year-olds 
Experimental 
male 6 6 5 An 
female 6 7 4 17 
Control 
male 8 6 5 19 
female 5 4 3 12 
Total 69 61 43 173 


experimental group. Due to limited resources, rat- 
ings were not made on every child every day; 
rather, six children were selected randomly within 
each of two day-care centers for observation 
each day. Over the period of 130 programs, every 
child was rated a number of times in these two day- 
care centers. One additional variable consisted of 
the number of absences for each child in both the 
experimental and control groups. 

The number of children remaining in the ex- 
periment for the entire six-month period can be 
seen in Table 1. Of 221 children in the initial sam- 
ple, 173 completed the experiment. No discernible 
bias due to dropouts could be discovered. 


Individual Test Battery 


Most of the tests were translated and adapted 
from similar tests developed by the Educational 
Testing Service for use in its evaluation of “Sesame 
Street.” Some had also been used in earlier forma- 
tive evaluation studies during the development of 
“Plaza Sesamo” in Mexico. The tests were given 
by carefully trained preschool teachers and psycho- 
logical assistants at each of three points in time, 
the pretest, the during test, and the posttest. A 
brief description of each follows. 

General Knowledge. This test contains 37 items 
yielding a possible range of scores from 0 to 37. 
Items consist of questions such as ^What do we 
smell with?”, “What do we see with?", “What is 
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the name of different parts of the body?", and 
“Which one of these four objects is the heaviest?” 

Numbers. This test contains 60 items dealing 
with such areas as counting, recognizing numbers, 
naming numbers, and finding a specific stimulus 
number among four others. Total score on the test 
can range from 0 to 60. 

Relations. This test contains eight items dealing 
with questions such as “Which bird is closest to 
the cage?”, “Which dog is going across the fence?”, 
and “Which monkeys are between the trees?” To- 
tal score ranges from 0 to 8. 

Embedded Figures. This preschool version of 
Herman Witkin’s Embedded Figures Test is now 
commercially available (Coates, 1971). The in- 
structions were translated into Spanish and pilot 
tested. Each of the 24 items consists of geometric 
designs in which a simple figure is embedded in a 
larger design. The test measures the ability of an 
individual to extricate hidden figures from the 
context. Total score ranges from 0 to 24. 

Parts of the Whole. 'This test contains 10 items. 
In each item there is a picture with separated parts 
and four whole objects (or figures or letters). The 
child is asked to say which of the whole pictures 
can be formed by the separated parts when put to- 
gether. Total score ranges from 0 to 10. 

Ability to Sort. This test contains 16 items; the 
total score can range from 0 to 16. The nature of 
the test can be illustrated by the following exam- 

ple: A card is presented with four figures, one with 
two shoes and the other three with only one shoe. 
The child is then asked which of the four figures is 
different from the other three. 

Letters and. Words. Involves the matching of 
letters as well as the recognition and labeling of 
letters. In addition, a number of the items deal 
with word recognition, the reading of words, and 
the matching of figures with words. A total of 51 
items yields a range of scores from 0 to 51. 

b Classification Skills. This test contains 24 items 
yielding a range of scores from 0 to 24. A typical 
item consists of two cards, one of which contains 
pictures of three objects (fruits, animals, ete.) that 
define a specific class. The other card contains 
drawings of four objects, only one of which can be 
correctly classified with the three on the first card. 
The child is asked to tell which of the four in the 
second card belongs with the three in the first, 

Oral Comprehension. Developed experimentally 
by Herschel Manuel at The University of Texas 
at Austin for use with bilingual children in either 
Spanish or English (Manuel, 1972), this test con- 
tains 35 items, yielding a total score ranging from 
0 to 35. A typical item consists of a set of four 
pictures of objects, animals, ete., about which the 
examiner asks simple questions. The child’s answer 
reveals whether or not he understands the instruc- 
tions or can perceptually identify the figure. Un- 
like a test dealing with perceptual discrimination, 
the objects in this test are sufficiently simple and 
different that any young child can easily recognize 
them. 
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Viewing Activities 

Experimental and control children were treated 
alike in every sense except for the programs viewed 
on television beginning at 3 o'clock each afternoon, 
The experimental groups were taken into a sepa- 
rate room to see “Plaza Sesamo" while the control 
groups were taken to other classrooms to see car- 
toons or other noneducational children's programs 
that were available over television at that hour, 
The television sets were 25-inch, black-and-white, 
new receivers placed in the front of the room ele- 
vated enough so that every child could see the 
screen and hear with no difficulty. 

Two minor exceptions to the general rule that 
experimental and control children were treated 
alike had to be made because of the space limita- 
tions within the day-care centers. The experimen- 
tal groups saw “Plaza Sesamo” in groups averaging 
in size about 15 children, while the control groups 
viewed cartoons in groups of approximately twice 
this size. This type of distribution might favor 
slightly the experimental cases in any comparisons, 
although in both experimental and control groups 
there was no difficulty in viewing the television 
screen. The second difference between the experi- 
mental and control groups arose from the fact that 
the control cases were kept in the day-care centers 
until 7 o'clock in the evening while the experimen- 
tals were released shortly after 5 o'clock. This dif- 
ference was necessary to prevent the control chil- 
dren from viewing “Plaza Sesamo” on another 
channel when it was broadcast each evening from 
6 to 7 o'clock. Parents of the control children were 
informed that their children would be undertaking 
special activities and games supervised by the as- 
sistants and, therefore, that they should not come 
for their children until 7 p.m. The games and social 
activities were designed to be as noncognitive as 
possible. 

As a further precaution against uncontrolled 
factors influencing the experiment, a few weeks 
after the experiment began, a survey was made of 
all the parents to determine the extent to which 
television sets were used in the home. It was foun 
that the incidence of television sets was essentially’ 
the same for both groups—67% in the experimen- 
tals and 63% in the controls. Still another possi 
bility was that children when absent from the day- 
care center might be viewing “Plaza Sesamo at 
home. The survey revealed that 50% of the experi" 
mental absentees viewed “Plaza Sesamo” at home 
while only 34% of the controls looked at it when 
absent due to sickness. The average number 0 
absences for the entire sample of children was only 
24. But, the fairly large standard deviation ( d 
indicates that a small number of children were & 
sent fairly frequently, usually because of colds an 
flu. j 

Some unexpected events threatened to dnas 
the experiment in spite of every precaution. i 
ing the six months of the study, there were T 
official vacations. The first one was the Easter 


LEARNING BY "PLAZO SESAMO" 


cation lasting only four days. The central authori- 
ties and day-care personnel were convinced by the 
research staff of the necessity to keep the day-care 
centers open on a regular schedule in spite of the 
Easter vacation, and parents were encouraged to 
bring their children to the day-care centers. The 
second vacation was much more difficult, lasting a 
period of 11 days in May, toward the end of the 
experiment. Ordinarily, all of the personnel within 
the day-care centers would be away for this holi- 
day. There was a real danger that children in the 
control groups would view “Plaza Sesamo” on their 
home television sets if they were not kept in the 
day-care centers during this ll-day period. The 
authorities were persuaded to keep the day-care 
center open in the afternoons from 2 to 7 P.M. in 
Spite of the holiday. As an inducement to parents 
to cooperate in the plan, the personal color tele- 
vision set of the principal investigator was donated 
as a prize to be raffled off among the parents of 
those children who had perfect attendance records 
during the 11-day period. Attendance during these 
days reached the highest peak of the entire study 
for both experimental and control groups. The 
television set was won by the father of one of the 
control children and he appeared the next day in 
his best attire, posing happily with his wife and 
child for photographs and basking in his enviable 
new status as the proud possessor of a color TV set. 

It is believed that the extraordinary steps taken 
to prevent any contamination of the control groups 
and to insure continuous viewing of “Plaza Ses- 
amo" by the experimental groups were fully justi- 
fied. In no case did it appear that the experimental 
design had been compromised. We believe the re- 
sults can be interpreted with confidence. 


RESULTS 


Three different statistical methods were 
employed in analyzing the data. The first 
method consisted of a large intercorrela- 
tion matrix containing all the test vari- 
ables at each of the three points in time 
when measurement took place—pretest, 
during the treatment, and posttest. In ad- 
dition, the sex of the child and identifica- 
tion of the day-care center were cast in the 
form of dichotomous variables and entered 
into the correlation matrix. The resulting 
Point-biserial correlations were utilized for 
testing the significance of the difference be- 
tween experimental and control, between 
male and female, and between the three 
day-care centers (one vs. the other two com- 
bined), since the results are equal to those 
obtained by Fisher's ¢ tests. The last varia- 

les included in the matrix consisted of the 
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mean attention score for each child in the 
experimental group (with a reduced N, of 
course) and the actual number of absences. 

The computer program allowed for a var- 
iable number of cases for each correlation in 
the matrix and printed out a separate trian- 
gular matrix containing the number of cases 
for each correlation. The resulting super- 
matrix consisted of three matrices of inter- 
correlations of special interest, the intercor- 
relations for the pretest measures, the inter- 
correlations for the during-treatment test 
measures, and the intercorrelations for the 
posttest measures. The remaining matrices 
in the supermatrix consisted of correlations 
across time, such as the correlation of Gen- 
eral Knowledge at pretest with Numbers at 
posttest. The diagonals in these matrices 
consist of test-retest correlation coefficients 
for each of the measures employed in the 
study. Only the highlights of this correla- 
tional analysis can be presented in the pres- 
ent report.* 

For clarity of discussion, the nine tests in 
the battery given each child can be classified 
into three different types. The first category 
ean be called content-achievement tests 
since all of them are criterion measures of 
skills specifically taught in the "Plaza Ses- 
amo" programs. General Knowledge, Num- 
bers, and Letters and Words were con- 
structed to measure the achievement of each 
child in each of these areas. The second cat- 
egory of tests involves measures indirectly 
related to “Plaza Sesamo" and can be called 
cognitive-content measures. Five tests are 
classified in this category—Relations, Parts 
of the Whole, Ability to Sort, Classification 
Skills, and Embedded Figures. These tests 
are content-related to “Plaza Sesamo” in 
the sense that one of the goals of "Plaza 
Sesamo" is to increase the child's abilities in 
these areas. But the tests themselves are not 
specifically criterion measures. The third 


* Complete results for the correlational studies, 
as well as other statistical analyses, have been 
deposited in three data archives where they are 
available upon request: (a) CIPAC, Condor 214, 
Mexico 20, D. F.; (b) Children's Television Work- 
Shop, One Lincoln Plaza, New York, New York 
10023; and (c) Hogg Foundation for Mental 
Health, University of Texas, Austin, Texas 78712. 
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TABLE 2 
INTERCORRELATIONS AMONG PRETEST SCORES For 73 Foun-YEAn-Orps 1 

Test PN PR SP PPT HPC PL cs PCO 
General Knowledge .67*  .42 .23 .12 .06 .31 -36 E 
Numbers (PN) .27 24 .08 i "S m " 
Relations (PR) —.06 b e P zr “ 
Embedded Figures (SP) -. M p 4 g M 
Parts of the Whole (PPT) j a w ; n e 
Ability to Sort (HPC) E Es " 
Letters and Words (PL) E E 
Classification Skills (CS) A 


Oral Comprehension (PCO) 


.001 levels, respectively. 


category of tests is called independent-cog- 
nitive measures and consists of tests that 
have no direct or indirect relation to the 
stated goals of “Plaza Sesamo." Oral Com- 
prehension is an example of such tests. One 
would expect to find the most significant dif- 
ferences between experimental and control 
groups on the content-achievement mea- 
sures and the least significant differences on 
the independent-cognitive tests. 

Intercorrelations among the nine major 

tests in these three different categories 
turned out to be of generally low order. With 
the exception of moderately high correla- 
tions for General Knowledge and Numbers 
(in the .60s), intercorrelations ranged from 
zero into the .30s and .40s. A sample matrix, 
the one for intercorrelations among pretest 
scores for all 4-year-olds, is presented in Ta- 
ble 2. The generally low level of intercorre- 
lations indicates that differences between 
the experimental and control groups can be 
looked at independently for each of the mea- 
sures without fear of undue repetition. 

The second method of analysis consisted 
of a series of analyses of covariance in which 
the during-test measures and the posttest, 
measures were adjusted statistically for re- 
gression upon the pretest measures. This 
analysis is especially powerful where signifi- 
cant correlations exist across time for a 
given measure. Of course, where there is no 
correlation between initial and later mea- 
sures on a particular test, the analysis of 
covariance reduces to a simple analysis of 
variance of the posttest scores. Analysis of 
covariance adjusts for initial differences 


(even though these are essentially random), 
yielding estimates of the net gain made by 
the experimental children who viewed 
“Plaza Sesamo" as contrasted to the control 
eases who did not. 3 

The third statistical method employed in 
analysis of the test data was analysis of 
variance in a repeated measures design to 
study trends in both experimental and con- 
trol groups across time. Independent varia- 
bles in the design were treatment (experl- 
mental vs. control), sex of child (male vs. 
female), and trials (pretest, during test, 
posttest)—a 2 x 2 x 3 design. This analy- 
sis of variance design was employed sepa- 
rately for each of the three age groups. Stil 
another analysis was undertaken in which 
the sex of the child was ignored and the three 
factors in the design consisted of age group 
(3, 4, and 5), treatment, and trial. And fi- 
nally, a simple analysis of variance ignoring 
trials and dealing only with the posttest 
scores was undertaken for each of the br 
age groups separately, as well as the io; 
sample combined, using only two indepen 
ent factors—sex (male vs. female) 8n 
treatment (experimental vs. control). 3 

The highlights of the findings from thes 
several methods and analyses are present A 
for each of the three categories of tests, 0P 
test at a time. 


Content-Achievement Tests 


General Knowledge. The most rigor 
test of gains made by the experime m 
groups who viewed “Plaza Sesamo, 8$ r 3 
pared to the control groups who did no% 
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TABLE 3 
POSTTREATMENT MEAN Scores rom EXPERI- 
MENTAL AND CoNTROL Groups ADJUSTED FOR 
REGRESSION ON INITIAL ScoRES 
(GENERAL KNOWLEDGE) 


Expects Contrat deed Adjusted Pre-post 
r 


Age group mental 
(E) ence 
3-year-olds | 14.7 | 12.0 | 2.7 5.4 .49 
4-year-olds | 26.2 | 18.9 | 7.3* | 4.7 .63 
5-year-olds | 30.3 | 25.5 | 4.8* | 4.9 64 
Combined | 24.3 | 19.3] 5.0* | 5.5 78 


+ E-C difference significant at or beyond .001 
level. 


provided by the analysis of covariance. The 
results for General Knowledge are presented 
in Table 3. The 4-year-olds made the largest 
gain in General Knowledge, as evidenced by 
the superiority of the experimental over the 
control, a difference of 7.3 points. The 5- 
year-olds who viewed “Plaza Sesamo" also 
showed significant improvement, the experi- 
mental group receiving a mean score of 30.3, 
as compared to a mean of 25.5 for the con- 
trol cases on the adjusted posttreatment 
scores. Only the 3-year-olds failed to show 
a significant difference between the experi- 
mental and control groups, although the 
small difference noted (2.7) is in the ex- 
pected direction. Since the maximum score 
on General Knowledge is 37, the 5-year-olds 
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Ficure 1. Treatment by trial interaction for 
eneral Knowledge. 
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who viewed "Plaza Sesamo" generally found 
the test very easy. The apparently greater 
improvement for the 4-year-olds, as con- 
trasted to the 5-year-olds, may be partly 
due to a minor ceiling effect for the 5-year- 
old experimental children. In both cases, 
however, the amount of improvement for the 
experimental group is dramatic. 

It should also be noted in Table 3 that 
General Knowledge is fairly stable across 
time, yielding correlations ranging from .49 
for 3-year-olds to .64 for 5-year-olds. The 
pre- and postcorrelation for the total sample 
of all ages combined is somewhat higher 
than for each of the ages separately (.78) 
because of the greater heterogeneity in the 
total sample since age is not controlled. The 
adjusted standard deviation of the post- 
treatment scores remains fairly constant 
across the three age groups, ranging from 
4.7 for 4-year-olds to 5.4 for 3-year-olds. 

Greater insight into the trends from the 
beginning to the end of the experiment can 
be gained by looking at the interaction of 
trial (the three points at which measure- 
ments were taken) with age group and 
treatment (experimental vs. control). Since 
the triple interaction involving all three fac- 
tors is not significant, any trend differences 
between the experimental and control groups 
can be looked at for all three age groups 
combined. In a similar manner, different 
trends for each of the three age groups can 
be looked at without regard to whether the 
child is in the experimental or control group. 
The interaction of treatment by trial is pre- 
sented in Figure 1. Here it can be seen that, 
although the experimental and control cases 
start out at essentially the same point, the 
experimental group increases more rapidly 
than the control group throughout the ex- 
periment. 

The interaction of age by trial for experi- 
mental and control groups combined is pre- 
sented in Figure 2. Gains made by the 3- 
year-olds are flatter than those for 4- and 
5-year-olds. Both the 3- and 5-year-olds 
show slightly decelerating trends over time, 
while the 4-year-olds continue to show con- 
stant gains over time. While these absolute 
differences in slopes are not large, the inter- 
action is highly significant. 
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Ficure 2. Age by trial interaction for General 
Knowledge. 


The two-way analysis of variance for the 
combined age groups involving treatment 
and sex revealed no sex differences for Gen- 
eral Knowledge, indicating that the major 
impacts of “Plaza Sesamo” upon perform- 
ance on General Knowledge apply equally 
to both boys and girls. 

Numbers. Results for the analysis of co- 
variance for the Numbers Test are presented 
in Table 4. As in the case of General Knowl- 
edge, 4-year-olds show the greatest gain 
from viewing “Plaza Sesamo.” Unlike Gen- 
eral Knowledge, however, even the 3-year- 
olds show a significant gain in ability to 
cope with numbers. Results for the analyses 
of variance across trials involving Numbers 
proved to be essentially the same as the re- 


TABLE 4 
POSTTREATMENT MEAN Scores For EXPERI- 
MENTAL AND CONTROL GROUPS ADJUSTED FOR 
REGRESSION ON INITIAL ScoRES (NumpBers) 


Experi- 


sec won| EEG OES, ae pa 
3-year-olds | 17.0 | 12.6 | 4.4* 5.0 .32 
4-year-olds | 28.9 | 21.1 | 7.8** | 6.9 .39 
5-year-olds | 35.5 | 29.3 | 6.2^* | 6.0 .56 
Combined | 27.9 | 21.5 | 6.4** | 7.1 Ni 


* E-C difference significant at .01 level. 
** E-C difference significant at or beyond .001 
level. 
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TABLE 5 
POSTTREATMENT MEAN Scores For EXPERI- 
MENTAL AND CONTROL GROUPS ÅDJUSTED FOR 
REGRESSION ON INITIAL SCORES 
(LETTERS AND Worps) 


Experi- x : 
Bae pps meatal O laiderence| SD <| T 
3-year-olds| 10.4 | 7.5 | 2.9* 3.1 .39 
4-year-olds | 14.9 | 10.4 | 4.5** | 3.9 .21 
5-year-olds | 17.8 | 12.7 | 5.1** | 4.4 .16 
Combined | 14.7 | 10.3 | 4.4** | 4.5 .43 


* E-C difference significant at .01 level. 
++ E-C difference significant at or beyond .001 
level. 


sults already reported for General Knowl- 
edge. The experimental group showed a 
steady improvement at a much faster rate 
than did the control group; a significant age 
by trial interaction indicates that the 3- 
year-olds improved at a much slower rate 
than did the older children, and no sex dif- 
ferences were noted. 

Letters and Words. Table 5 contains re- 
sults from the analysis of covariance for 
Letters and Words. This test proved to be 
much more difficult for many of the children 
than did either General Knowledge or Num- 
bers. The adjusted posttreatment mean score 
for even the experimental 5-year-olds is 
only 17.8 out of a maximum score of 51. The 
computer program employed for analysis of 
covariance also tests for assumptions under- 
lying covariance analysis. In the case of 
Letters and Words for 4- and 5-year-olds, 
the assumption of a common linear slope 
cannot be fully justified. In one case (5: 
year-olds), the regression of posttreatment 
scores upon initial scores departs signifi- 
cantly from linearity. In the case of the * 
year-olds, the linear slopes for regression 
for the experimental and control groups Are 
significantly different. For this reason, t 
magnitude of the expected posttreatmen 
mean scores may be slightly distorted, di ^ 
though there can be no question about tl i 
basic significance of the difference betta 
experimental and control groups. In all s 
age groups, the correlation between inii 
and posttreatment scores on Letters 8 
Words is so low (ranging from .16 to - 
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that very little adjustment has taken place 
in the posttreatment scores by use of regres- 
sion techniques in analysis of covariance. 

Greater insight into the differences be- 
tween the experimental and control groups 
in their patterns of improvement over time 
can be gained by examining the interactions 
in the analysis of variance involving re- 
peated measures. Lack of a significant tri- 
ple-order interaction among age group, 
trials, and treatment indicates that the sig- 
nificant interaction of treatment by trials 
can be easily interpreted. The experimental 
group clearly improves across the three 
measurement periods while the control 
group does not. Both the experimental and 
control groups have means of approximately 
8.6, initially, whereas the posttreatment 
means are 14.4 for experimental and 10.2 
for control, a difference significant well be- 
yond the .001 level. No significant differ- 
ences were noted between boys and girls. 

All three of the tests classified as content- 
achievement measures showed similar re- 
sults in the analyses of covariance and var- 
lance. Although experimental and control 
groups started out at essentially the same 
place in the pretest measures, the experi- 
mental group quickly outdistanced the con- 
trol in the first seven weeks of the experi- 
ment, maintaining and increasing their gains 
by the end of the six months. This widening 
gap between experimental and control cases 
is generally accentuated among the 4-year- 
olds and only moderately noted among the 
3-year-olds. It is not surprising that the 
youngest children should show the least 
amount of improvement when viewing 
Plaza Sesamo” since most of the measures 
are designed for slightly older children. 
Clearly, those measures in the test battery 
that are specifically related to the instruc- 
tional segments of “Plaza Sesamo” programs 
do indeed show the expected improvement 
from viewing “Plaza Sesamo.” 


Cognitive-Content Tests 


Relations. Table 6 contains results from 
the analysis of covariance for Relations, one 
of the five cognitive content measures used 
In the study. It is interesting to note that 
the correlation between initial and post- 
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TABLE 6 
PosrREATMENT Mean Scores For EXPERI- 
MENTAL AND CONTROL GROUPS ADJUSTED FOR 
REGRESSION ON INITIAL SconEs (RELATIONS) 


Experi- Control| E-C Adjusted | Pre-post 


Age group | mental | YC)” laiderence| SD. | 7 
3-year-olds | 3.7 3.2 5 1.4 19 
4-year-olds| 5.6 | 4.0 | 1.6 | 1.5 .01 
5-year-olds | 5.9 5.2 7 1.7 19 
Combined | 5.1 | 4.2 WEI 358 .21 


* E-C difference significant at .01 level. 


treatment scores on Relations is essentially 
zero, ranging from .01 to .19 among the 
three age groups. Differences between the 
experimental and control groups were sig- 
nificant only for the 4-year-olds and then 
only at the .01 level. 

Essentially the same results were obtained 
in the analysis of variance using repeated 
measures across the three testing periods; 
only the 4-year-olds show any significant 
advance in performance on Relations as a 
result of viewing “Plaza Sesamo.” Probably 
the relatively insignificant results for Rela- 
tions arise in large part from the unrelia- 
bility of the test. The lack of any correla- 
tion across time suggests little test-retest 
stability. With only eight items in the test, 
it is not surprising that the gains are rather 
inadequately measured. 

Parts of the Whole. The analysis of co- 
variance for Parts of the Whole is summa- 
rized in Table 7. The differences between 
experimental and control groups are rela- 
tively small although in the expected direc- 


TABLE 7 
POSTTREATMENT Mean Scores FOR EXPERI- 
MENTAL AND CONTROL GROUPS ÅDJUSTED FOR 
REGRESSION ON INITIAL SCORES 
(Parts OF THE WHOLE) 


Experi- 

Age group menial Control ENA fetal [Pre-post 
3-year-olds | 3.4 | 2.9 | .5 1.5 .21 
4-year-olds | 5.0 | 4.4 .6 2.1 | —.02 
5-year-olds | 6.7 | 5.1 | 1.6* 2.5 .06 
Combined | 5.2 | 4.2 | 1.0** | 2.3 14 


^ + EC difference significant at .05 level. 
++ E-C difference significant at .01 level. 
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TABLE 8 
POSTTREATMENT Mean Scores FOR EXPERI- 
MENTAL AND CONTROL GROUPS ADJUSTED FOR 
REGRESSION ON INITIAL Scores 
(ABILITY To Sort) 


Experi- 

Age group_| menai Control |, EAC Adjusted | Pre-post 
3-year-olds| 3.4 | 2.9 5 1.7 .21 
4-year-olds | 7.7 | 4.7 | 3.0* | 2.6 .07 
5-year-olds| 9.6 | 6.4 | 3.2* | 3.2 ES 
Combined | 7.5 | 4.9 | 2.6* | 3.1 .32 


* E-C difference significant at or beyond .001 
level. 


tion. Only the 5-year-olds, when considered 
alone, show a difference that reaches statis- 
tical significance. When all three groups are 
combined, however, the greater stability of 
results with a much larger number of cases 
yields a reliable difference in favor of the 
experimental groups. It should be noted, 
however, that the difference in absolute 
terms is very small, amounting to only 1 
item on the average. Since there are only 10 
items in this test, these results suggest that 
the stability of measurement and the clarity 
of results would be greatly enhanced by in- 
ereasing the length of the test and obtaining 
more diseriminating items for any future 
revision. 

In the complex analysis of variance using 
repeated measures, Parts of the Whole 
shows signifieant interactions between both 
age groups and treatment, on the one hand, 
and trials on the other. As would be ex- 
pected from the covariance analysis, the ex- 
perimental group shows a sharper rise in 
mean score across time than does the con- 
trol group. The increase is appreciably 
greater for 5-year-olds than for either of 
the younger groups. No sex differences were 
discovered. 

Ability to Sort. As indieated in Table 8, 
after viewing “Plaza Sesamo" both 4-year- 
olds and 5-year-olds show significant gains 
in their ability to sort different objects. 
things, or persons. Only the 3-year-olds 
failed to show any significant difference, al- 

though a slight trend appears in the ex- 
pected direction. When all three groups are 
combined, the results are highly significant, 
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of course. Again, it should be noted that 
there is a relatively low correlation between 
pre- and posttest measures, in this case es- 
pecially for 4-year-olds. In the case of the 
4- and 5-year-olds, a net gain of 3 points for 
the experimental over the control groups is 
indeed an appreciable one. 

Greater insight into the meaning of these 
results can be obtained by closer examina- 
tion of the analysis of variance using re- 
peated measures. Highly significant inter- 
actions were obtained for age by trial and 
treatment by trial, although the triple inter- 
action was not significant. Consequently, 
generalizations can be made for all three 
age groups with respect to the superior per- 
formance of the experimental groups. The 
combined mean for all three age groups in 
the pretest period is about 3.8 for both ex- 
perimental and control cases. The gap be- 
tween experimental and control groups wid- 
ens seven weeks later to 5.5 and 48 for 
experimental and control groups, respec. 
tively. By the end of the experiment, the 
children who viewed "Plazo Sesamo" re- 
ceived a mean of 7.3 as contrasted to only 
4.9 for the control cases. It is interesting to 
note that the experimental group steadily 
improved throughout the experiment while 
the control group showed only minor 1m- 
provement (probably due to test-retest 
adaptation) in the early stages of the ex 
periment, showing no improvement there- 
after. Again, no differences were noted be- 
tween boys and girls regardless of age group: 

Classification Skills. Results of the analy- 
sis of covariance are summarized for Class 
fication Skills in Table 9. All three ag 
groups show significant differences betwee? 
the experimental and control groups. The 
greatest difference appears among the 
year-olds, a finding consistent with results 
reported for other measures. The intera- 
tions of treatment with age and trial in the 
analysis of variance using repeated ne 
sures were also highly significant. The € 
perimental group shows a steady i 
marked increase across the testing porem 
while the inerease for the control group ki 
rather slight. No differences were noted 
tween boys and girls. 

Embedded Pens The last of the fiv? 
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TABLE 9 
POSTTREATMENT MEAN SCORES FOR EXPERI- 
MENTAL AND CONTROL GROUPS ÁDJUSTED FOR 
REGRESSION ON INITIAL Scores 
(CLASSIFICATION SKILLS) 


Experi- \Control| E-C (Adjusted | P; 
DEASE lu ue | Pee post 
3-year-olds | 6.9 4.9|2.0*]| 2.3 | —.01 
4-year-olds | 11.9 | 8.7 | 3.2***| 2.9 37 
5-year-olds | 14.4 | 12.1 | 2.3t 3.6 .36 
Combined | 11.5 | 8.7|2,8***| 4.0 45 


+ E-C difference significant at .05 level. 

++ E-C difference significant at .01 level. 
++ E-C difference significant at or beyond .001 
level. 


measures classified as cognitive-content 
measures is Embedded Figures, the pre- 
school adaptation of Witkin’s Embedded 
Figures Test. It is less closely related to the 
content of “Plaza Sesamo” than any of the 
other tests in this cognitive-content group, 
although some of the instructional sequences 
in “Plaza Sesamo” appear closely related 
to the type of ability measured by Em- 
bedded Figures. Table 10 summarizes re- 
sults for the analysis of covariance. 

Unlike most of the other tests, Embedded 
Figures shows a relatively high degree of 
Stability aeross the six-month period from 
pre- to posttreatment measurement—corre- 
lations ranging from .45 to .64. Conse- 
quently, the covariance analysis is appreci- 
ably more precise than would be a simple 
analysis of variance of the posttreatment 
Scores alone. Only the 4-year-olds show a 
Significant difference between the experi- 
mental and control groups and even this dif- 
ference is significant at only the .05 level. 
Among the 5-year-olds, there appears to be 
no difference between the adjusted mean 
Values for the experimental and control 
groups. For the 3-year-olds, the difference 
1s somewhat greater (3.7 points) and almost 
Teaches significance (p = .06). 

Since there are 24 items in the test, it is 
obvious that Embedded Figures is much too 
easy for the 5-year-olds, probably account- 
Ing for the lack of difference between the ex- 
Perimental and control groups. The large 
Standard deviation (6.8) for adjusted post- 


641 


treatment scores among the 3-year-olds sug- 
gests that the individual differences among 
young children on this test are so great that 
even a difference of 3.7 points, as obtained 
between experimental and control groups, is 
not sufficient to be impressive. When the 
analyses of covariance for posttest scores 
regressed upon during-treatment scores and 
for during-test scores regressed upon initial 
scores are examined, it is clear that among 
the 3-year-olds what little gain there is 
takes place rather late in the experiment. 
No significant sex differences were noted. 
The above results suggest that Embedded 
Figures would be a promising instrument 
for future studies of this type provided it is 
revised and extended to overcome the defi- 
ciencies noted above. 

While results for the cognitive-content 
tests are not quite as striking as those for 
the more specific content-achievement tests, 
in every one of the five tests at least one age 
group showed significantly greater improve- 
ment over time for those children who 
viewed “Plaza Sesamo” than for those who 
did not. The measures most consistent in 
this regard were the two tests dealing with 
different aspects of categorizing behavior— 
Ability to Sort and Classification Skills. The 
results might have been more striking for 
Parts of the Whole and for Relations if 
these two tests were not so unstable across 
time. Since both of them can be improved 
considerably by lengthening the tests and 
including more discriminating items, the 
cognitive abilities represented by them 
should be studied carefully in future evalua- 
tions of “Plaza Sesamo.” Even Embedded 


TABLE 10 
POSTTREATMENT Mean Scores FOR EXPERI- 
MENTAL AND CONTROL GROUPS ÅDJUSTED FOR 
REGRESSION ON INITIAL SCORES 
(EmBEDDED FIGURES) 


Experi- (Control | E-C  |Adjusted | Pre-post 
r 


Agegroup | mental | (C) difference] SD 
3-year-olds | 13.0 | 9.3 3.7 6.8 .45 
4-year-olds | 19.4 | 17.4 2.0* 3.7 .48 
5-year-olds | 20.9 | 20.4 5 3.0 64 
Combined | 18.0 | 16.1 | 1.9* | 5.0 .68 


* E-C difference significant at .05 level. 
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TABLE 11 
PosrrREATMENT Mean Scores FOR EXPERI- 
MENTAL AND CONTROL GROUPS ÁDJUSTED FOR 
REGRESSION ON INITIAL SCORES 
(ORAL COMPREHENSION) 


Experi- 

Age group mentai contol pros Mns PIS ae 
3-year-olds | 13.6 | 10.5 | 3.1* | 5.4 44 
4-year-olds | 23.4 | 18.1 | 5.3** | 6.4 .22 
5-year-olds | 26.2 | 23.8 | 2.4* 4.2 59 
Combined | 21.6 | 17.9 | 3.7***| 6.1 .67 


+ E-C difference significant at .05 level. 
++ E-C difference significant at .01 level. 
++ E-C difference significant at or beyond .001 
level. 


Figures, the test in this category most re- 
motely associated with “Plaza Sesamo” con- 
tent and goals, yielded significant differences 
for the 4-year-olds (and nearly significant 
differences for the 3-year-olds). 


Independent-Cognitive Measures 


Oral Comprehension. The one test among 
the many used for evaluation of “Plaza 
Sesamo” that, on the face of it, was com- 
pletely unrelated to the content and goals of 
the television program was Oral Compre- 
hension, the test developed for use with 
young bilingual children. As can be seen in 
Table 11, the results for all three age groups 
in the analysis of covariance were signifi- 
cant. The 4-year-olds showed the greatest 
gains from viewing “Plaza Sesamo,” a net 
difference of 5.3 points advantage over the 
controls. The test-retest stability ranges 
from .22 to .59 for the three age groups, in- 
dicating that the analysis of covariance adds 
significantly to the precision of the analy- 
sis. Results from the analysis of variance 
using repeated measures are essentially the 
same as those obtained for most of the other 
measures in this study. 

It is interesting to note that highly sig- 
nificant gains were made by the experimen- 
tal group, as contrasted to the control group, 
on Oral Comprehension even though it has 
no relation to the curriculum and goals of 
"Plaza Sesamo." This important finding 
suggests that there may well be general 
gains of a cognitive nature as a result of 
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viewing “Plaza Sesamo.” Efforts should be 
made to explore other cognitive, perceptual, 
and linguistie measures that could be incor- 
porated into future evaluations of "Plaza 
Sesamo." 


Other Measures 


In addition to the tests given individually 
to both experimental and control! children, 
observational ratings on attention were ob- 
tained on a random sampling basis for every 
child in the experimental group in two day- 
eare centers. The number of absences was 
also recorded for every child in the study. 
Beeause of the fact that attention ratings 
were only available for the experimental 
group, no analyses of variance or covariance 
could be undertaken. The highly skewed na- 
ture of the distribution of absences also pre- 
cluded such analyses. 

Within the combined experimental group, 
the degree of attention to “Plaza Sesamo" 
correlated significantly with pretest Num- 
bers (.32), during-test Classification Skills 
(32), and six tests from the posttreatment 
battery—General Knowledge (.32), Num- 
bers (.35); Parts of the Whole (.30), Ability 
to Sort, (.49) , Classification Skills (.33), and 
Oral Comprehension (.41). The larger num- 
ber of appreciable correlations in the post- 
treatment test battery is to be expected since 
attention (or lack of it) would have a cu 
mulative effect over the six months of the 
“Plaza Sesamo” programs. Clearly, degree 
of attention to what is going on is important 
to measure. 

The number of times a child was absent 
also influenced his performance on some 0 
the tests in the experimental group, "s 
quently yielding correlations in the .20s wit 
posttest measures of Oral Comprehension, 
Classification Skills, Letters and Words, Re 
lations, and Numbers. Children with a las 
number of absences did less well on the p 
treatment test battery than did children v g 
attended regularly. These results are m 
surprising and indicate, as in the case of al 
tention, that the number of absences sn 
be carefully recorded in any evalua s 
study of this type. The correlation be 
degree of attention and number of absen 
was —.26 (significant at the .05 level) 


LEARNING BY "PLAZO SESAMO" 


The outcome of this experimental study 
of preschool children, half of whom viewed 
"Plaza Sesamo" and half of whom did not, 
clearly indicates that signifieant gains are 
made in a number of cognitive and percep- 
tual areas by those children who faithfully 
watch the television program. While such a 
finding is not surprising for those areas of 
specific achievement related to the curricu- 
lum content of the television programs, it is 
indeed significant that even measures of cog- 
nitive ability seemingly unrelated to the 
main thrust of the curriculum show measur- 
able improvement in children who viewed 
the program as contrasted to those who did 
not. 
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ACQUISITION PROCESSES AND RESILIENCE UNDER VARYING 


TESTING CONDITIONS FOR STRUCTURALLY DIFFERENT 


PROBLEM-SOLVING PROCEDURES' 


RICHARD E. MAYER’ 
Indiana University 


Binomial probability problems were taught by emphasizing either 
calculating with a formula (Sequence F) or meanings of the variables 
in the formula (Sequence G). Two main experiments involved a multi- 
leveled transfer posttest administered after the subject read some of 
his instructional booklet and variational testing conditions (e.g., open 
versus closed book, speed versus power). A Treatment X Posttest 
interaction (TPI) resulted in which Sequence G subjects excelled on 
interpretive items while Sequence F subjects excelled on near transfer 
and the Treatment X Posttest interaction did not differ reliably 
among different points in learning or under different testing condi- 
tions. Results indicated the importance of a subject's initial assimilative 
set in acquiring new knowledge and the apparent resilience of the ac- 


quired structure. 


Previous studies (Egan & Greeno, 1973; 
Mayer & Greeno, 1972) have suggested 
that teaching subjects to solve mathemati- 
eal problems by different instructional 
methods may result in learning outcomes 
which differ in structural or qualitative 
ways. This inference was indicated by a 
pattern of posttest performance in which 
subjects in one instructional group excelled 
on one kind of transfer posttest item and 
subjects in another group excelled on an- 
other kind, producing a disordinal inter- 
action called Treatment x Posttest interac- 
tion or TPI. 

Development of structural difference. A 
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major new question dealt with in the 
present study is, How do the cognitive 
struetures that support these different pat- 
terns of transfer performance develop over 
the course of learning? Or, put another way, 
How can we characterize the acquisition 
processes for structurally different learning 
outcomes? At least two kinds of theories 
of the acquisition process have been pro- 
posed: (a) A fairly straightforward iden, 
one that follows from asking “how much 5 
learned, is that apparent differences 1n wha 
is learned are due to some subjects acquir- 
ing more of one kind of content and less E 
another relative to other subjects. (b) 

more complex proposition, one that follows 
from asking “what is learned" (e; Roug ; 
ead & Scandura, 1968), is that difora 
kinds of learning outcomes are due to a 
quisition processes in which the same b 
tent material is encoded within diffe p 
assimilative sets by different subjects. E 
though the first proposal requires only ke 
analysis of the amount of material ud 
mitted, most recent educational od 
have relied on modified versions of | 
ond proposal in which the subject's cogni 


: ing 8 
activity or receptive set during learning 


| 


« 


ACQUISITION PROCESSES FOR PROBLEM SOLVING PROCEDURES 


well as the presented material must be 
analyzed. 

The present study attempts to provide 
information concerning the nature of the ac- 
quisition process for mathematical knowl- 
edge and to provide more direct information 
on the hypothesis stated by Mayer and 
Greeno (1972) : 


different instructional procedures could activate 
different aspects of existing cognitive structure. 
And since the outcome of learning is jointly deter- 
mined by new material and the structure to which 
it is assimilated, the use of different procedures 
could lead to the development of markedly differ- 
ent structures during the learning of the same con- 
cept [p. 165]. 


Resilience of structural differences. A 
second new question dealt with in the 
present study is, Once learning outcomes 
have been established, can the pattern of 
transfer performance be altered by testing 
manipulations such as open book versus 
closed book, or speed versus power condi- 
tions? In other words, how resilient are 
structurally different learning outcomes to 
varying testing conditions? Such manipula- 
tions represent an attempt to force subjects 
who have learned by different instructional 
methods to processes “what is learned” in 
the same way. If structural differences 
“disappear” under these circumstances, that 
is, the Treatment x Posttest interaction is 
hot present, the importance of the original 
teaching method in establishing the struc- 
ture of learning outcomes would be dimin- 
ished, 

Present study. In the present experi- 
ments, the concept of binomial probability 
Was taught using expository, four-lesson 
teaching booklets and two teaching methods 
that differed in sequencing and emphasis. 
One instructional method (Sequence F) be- 
gan each lesson with a formal statement of 

€ rule or subrule and explained compo- 
nent variables only within the context of 
calculating with the formula. The other 
dad (Sequence G) began each lesson by 
of Trening to relate component variables 
bed € formula to the subject’s general ex- 
ind for example, trials, outcomes, and 
Rn before presenting any formal 
att ement of the rule. Thus, Sequence F 

empted to activate a narrow range of 
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subject’s experience such as performing 
arithmetic computations, while Sequence G 
attempted to activate a broader receptive 
set in the subject including a wide range of 
general experience with probabilistic situa- 
tions. These procedures represent an at- 
tempt to provide subjects with what Ausu- 
bel (1968) has termed “rote” versus 
"meaningful learning sets" or what Greeno 
(1972) has called “algorithmic” versus 
“propositional knowledge." Learning was 
assessed by a multileveled transfer post- 
test which contained both near (i.e., items 
very similar to those in the training book- 
lets) and far (i.e., items requiring interpre- 
tation) items. 

To provide information on the acquisition 
question, the posttest was administered at 
three points in learning for subjects in both 
instructional groups. Amount 1 subjects 
were tested after the first two lessons 
(introduction and combinations), Amount 
2 subjects after three lessons (introduction, 
combinations, and joint probability), and 
Amount 3 subjects after all four lessons 
(introduction, combinations, joint proba- 
bility, and binomial probability) had been 
presented. Both Experiment 1 and Experi- 
ment 2 used this procedure, although the 
ordering of booklet lessons differed, In 
addition, a supplemental study was con- 
ducted in which the subject, instead of tak- 
ing a transfer posttest, was asked to re- 
produce what he had just read as if he were 
explaining it to a naive learner. 

To provide information on the resilience 
question, the conditions of testing were 
varied. In Experiment 1, the testing vari- 
able was memory support with some subjects 
tested under open-book and others under 
closed-book conditions. In Experiment 2, 
the testing variable was time stress with 
some subjects tested under speed (time 
limit) and some under power (no-limit) 
conditions. In addition, a supplemental 
study was conducted which involved a two- 
day retention interval. 


EXPERIMENT 1 


This study attempted to assess both the 
consequences of varying the sequencing 
and amount of instruction for a mathemati- 
cal concept and the presence of memory 
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support during testing. In each instructional 
sequence, subjects were given varying 
amounts of instruction. All subjects took 
the same 30-item posttest, containing sev- 
eral kinds of test items, and the condition 
of testing was varied. 


Method 


Subjects and Design 


Subjects were 117 University of Michigan stu- 
dents who had volunteered to participate in psy- 
chological experiments at the Human Performance 
Center for pay. Nine subjects served in each cell 
of a 2 X 3 X 2 factorial design, with a thirteenth 
group of subjects serving as controls. The factors 
were method of instruction (Group F or Group 
G), amount of instruction (Amount 1, 2, or 3), 
and testing condition (open book or closed book). 


Materials 


The two instructional texts (Sequence F and 
Sequence G) were incorporated into 2 four-lesson 
typewritten booklets with one to three pages in 
each lesson. The booklets were similar to those 
used in earlier studies (Mayer & Greeno, 1972) 
ries md been reproduced in full elsewhere (Mayer, 
, Instruction in Sequence F (for formula) con- 
sisted of the following four lessons: (a) an intro- 
duction, formally presenting the binomial formula 
and showing how it could be broken down into 
three smaller algorithms; (b) combinations, em- 
phasizing how to obtain a value for N!/[(N — 
R)!R!] (where N is number of trials and R is 
number of successes); (c) joint probability, em- 
phasizing how to obtain a value for P&(1 — P)*-* 
(where P is probability of successes); and (d) bi- 
nomial probability, emphasizing how to find the 
product of the above values to obtain the final 
answer. Thus, Sequence F began each lesson with 
the formal presentation of the formula or sub- 
formula, and component concepts (eg. trial, 
failure, failure probability) were explained only in 
relation to caleulating with the formula. Through- 
out instruction in Sequence F, the amount of inter- 
pretation of variables was minimized, and em- 
phasis was on numerical calculation. 

The instructional booklet for Sequence G (for 
general concepts) consisted of the following four 
lessons: (a) an introduction, presenting definitions 
of variables (e.g., number of trials, number of suc- 
cesses, etc.) in relation to general experience; (b) 
combinations, reemphasizing the relevant concepts 
before presenting the subformula for combinations ; 
(c) joint probability, reemphasizing relevant con- 
cepts before presenting the subformula for joint 
probability; and (d) a binomial probability, which 
tied the subformulas together conceptually before 
presenting the full binomial formula. Thus, Se- 
quence G began each lesson with individual vari- 
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ables and developed from parts to the whole eoi 
cept. Through the instruction in Sequence 
emphasis was on explanations of the meanings 
concepts, and calculation was discussed only ¥ 
full conceptual explanation had been given. 

A test set consisted of 30 typewritten e 
representing five problem types, two problem 
mats, and three problem content areas. The 8 
2 X 3 design yielded 30 cells, each represen 
one of the 30 test items. 1 

The first transfer dimension was the type of 
item. Familiar problems (F type) were presen 
in the same way as example problems during 
ing; transformed problems (T type) requi 
transformation, usually of an algebraic nature; 
be put into familiar form; so-called Luchins pro 
lems (L type) presented a complicated-lookl 
situation which could be solved quite easily if 
subject would take a moment to "think"; que 
items (Q type) asked a question about the ¥ 
ables in the formula rather than requiring & 80 
tion value; and unanswerable problems (U ty 
although looking very much like F-type proble 
presented either insufficient or inconsistent 
formation. 

The second dimension was test problem foi 
Formula problems (F format) were stated in 
of N, R, and P—the formal notation used inp 
senting the formulae in the teaching bookle 
story problems (S format) were stated in term 
some situation not discussed in the teaching bo 
lets such as sampling peanuts from a barrel 
which some portion is rotten. 

The third dimension was the content of 4 
problems. Combinations problems (C conte 
asked for or dealt with C (N, R) or the numbel 
combinations; joint probability problems (e 
tent) asked for or dealt with P® x (1 — py 
the probability of a specific sequence; binom 
probability (B content) asked for or dealt witht 
theme of the teaching booklets, finding P (R, 
or the probability of R successes in N trials. 

Examples of the test problems and answers} 
published elsewhere (Mayer, 1973). 

Additional materials were a subject record 
sisting of questions to determine the extent 0! 
subject’s experience in statistics and probabi 
and a pretest designed to determine whether 
subject had sufficient computational skill to ms 
the material in the teaching booklets. 


Procedure 


Subjects were run in groups averaging four] 
session. First, the subject record and pretesti " 
administered. Subjects indicating no relevan : 
perience with the binomial but making no, 
than three computational errors were ran 
assigned to treatment groups. r 4 

The experimenter read the instruction 
again asked the subject to indicate fami E 
with the binomial. Then, each subject m e 
the appropriate teaching booklet. Subjec E 
instructed to read their booklets silently 
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their own rate and to try to understand the ex- 
planations and examples. Subjects were allowed 
to take notes or figure on a blank piece of paper. 
Subjects were told they would have 20 minutes. 
All finished within that time limit. Subjects who 
finished earlier were asked to sit quietly until the 
test began. 

Immediately following the reading period, the 
experimenter collected the teaching booklets and 
notes (except for memory support condition sub- 
jects), read the instructions for the test, and 
wrote the formula for P (R, N) on the chalkboard. 
Each subject was given a pile of 30 problem cards 
face down and a blank answer sheet. On the ex- 
perimenter's signal, the subject was to turn up the 
first card, copy its code number, show his work and 
circle his final answer, and then go on to the next 
card. There was no time limit, but all subjects were 
told that once they began a new card they could 
not go back to work on any previous card. All sub- 
jects were told to write “no answer" if they felt 
a problem was unanswerable. 

Nine different orders of presenting the test cards 
were constructed. The orders were random except 
for the constraint that, of the nine orderings, each 
item had to appear in the first one third (i.e., 1-10) 
in three of the sets, in the second one third (ie., 
11-20) in another three of the sets, and in the last 
one third (i.e., 21-30) in the other three sets. 


Results 


The posttest performance of subjects who 
passed the pretest but expressed no famil- 
iarity with the binomial was scored with 
each item marked either correct or incor- 
rect. Answers were marked correct if they 
were in proper form even though computa- 
tion may have been incorrect or not carried 
out. The control subjects performed at very 
low levels, that is, an average of less than 
10% correct, thus indicating that the ex- 
perimental treatments had a substantial ef- 
fect. An analysis of variance was performed 
on the data of all experimental subjects. 
There was only a marginal difference in the 
Overall performance of the two experimental 
groups (F = 2.92, df = 1/96, p < .10), thus 
frustrating the question of “which method is 

est.” 


Development of Structural Differences 


Figure 1 shows the proportion correct re- 
Sponse for the two instructional groups by 
posttest item (ie., Treatment X Posttest 
interaction) for each of three amounts of 
instruction and for each of three posttest 
dimensions. For each of the three partitions 
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Ficure 1. Treatment X Posttest interaction by 
amount of instruction for three posttest dimensions 
in Experiment 1. (Problem format: F = formula, 
S = story; problem type: F = familiar, T = trans- 
formed, L = Luchins, Q = question, U = unanswer- 
able; problem content: C = combinations, J = 
joint probability, B = binomial probability.) 


of the posttest set (i.e., by format, by type, 
or by content), there is reason to suspect 
that the final outcome (i.e., shown in the 
right panels) of learning differed between 
subjects in Sequence F and Sequence G, 
in a manner established in previous studies 
(Mayer & Greeno, 1972) ; that is, Sequence 
F subjects excelled on near transfer and 
Sequence G on far transfer. 

In order to determine how these final 
differences in cognitive structure developed 
over the course of learning, it is useful to 
compare the pattern of Treatment x Post- 
test interaction in the left panel (after two 
lessons) to the center panel (after three 
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lessons) to the right panel (after all four 
lessons). There is a significant overall effect 
due to the amount of instruction (F — 4.49, 
df — 2/29, p « .025), suggesting that add- 
ing more lessons increased performance for 
both instructional groups. —. 

The top row of Figure 1 shows the In- 
structional Sequence x Problem Format 
interaction after each amount of instruc- 
tion. The two-way Sequencing X Format 
interaction was reliable (F = 14.72, df = 
1/96, p < .001), indicating structural dif- 
ferences in what was learned. This Treat- 
ment x Posttest interaction is clearly evi- 
denced in all three panels, and there was no 
reliable Sequence X Amount X Format in- 
teraction (F = 1.00, df = 2/96, p > .20); 
thus it is impossible to reject the hypothesis 
that the Treatment x Posttest interaction 
was the same at each point in learning. Al- 
though the performance of each sequence 
group reliably increased overall from 
Amount 1 to Amount 2 to Amount 3, there 
was no reliable Amount x Format interac- 
tion (F = 1.00, df = 2/96, p > .20). This 
suggests a proportional quantitative in- 
crease for both problem formats rather than 
a structural change as the amount of in- 
struction was increased. 

The middle row of Figure 1 shows the 
Instructional Sequence X Problem Type 
interaction after each amount of instruction. 
The two-way Sequencing x Type interac- 
tion was reliable (F = 6.28, df = 4/384, p < 
.001), suggesting structural differences simi- 
lar to those found in earlier studies. Even 
after Amount 1, there is evidence of this 
characteristic Treatment X Posttest inter- 
action with Sequence F slightly ahead for F 
type and Sequence G superior on Q type 
and U type. This pattern seems fairly con- 
sistent across all three amounts of instruc- 
tion, and there was no reliable Sequence x 
Type X Amount interaction (F = 1.03, df = 
8/384, p > .20) ; thus it is impossible to re- 
ject this hypothesis. Although the perform- 
ance of each treatment group reliably in- 
creased overall from Amount 1 to Amount 2 
to Amount 3, there was no reliable Amount x 
Type interaction (F = 1.05, df = 8/384, p > 
.20), again suggesting that the increase was 
uniform across all problem types. These 
findings are consistent with the notion that 
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the structure of learning outcomes remained 
constant across all three amounts of instrue- 
tion. 

The bottom row of Figure 1 shows the 
Sequence X Problem Content interaction 
after each amount of instruction. The two- 
way Sequence X Content interaction was 
significant (F = 10.19, df = 2/192, p < 
001), again indicating a difference in the 
structure of what was learned. There was no 
reliable Treatment x Content x Amount 
interaction (F = 1.77, df = 4/192, p > .15); 
thus, it is not possible to reject the notion 
that the structural differences among the 
instructional groups remained constant at 
each of the three points in learning which 
were tested. A marginally reliable Amount 
X Problem Content interaction (F = 2.54, 
df = 4/192, p % .050) suggests that, not 
only did performance increase overall with 
the addition of more sections, but it also in- 
creased disproportionately more, as might 
be expected, on the content of the material 
covered. 

In summary, the three possible Sequence 
X Posttest x Amount interactions rep- 
resent separate tests of the “acquisition 


question.” In all three cases, there was 8 . 


reliable two-way Treatment x Posttest in- 
teraction, but there was no evidence of any 
difference in the Treatment x Posttest in- 
teraction among Amount 1, Amount 2, and 
Amount 3. These results suggest that strut- 
tural differences occurred early in learning 
and remained fairly constant throughout. 


Resilience of Structural Differences 


This study was also intended to provide - 


some information about the resilience of the 
Treatment x Posttest interaction under 
varying testing conditions. Figure 2 shows 
the proportion correct response for the two 
instructional sequences by posttest item 
(ie., Treatment x Posttest interaction p 
each of two levels of memory support 3» 
for each of three posttest dimensions. — 
performance for the instructional sequent 
groups under memory-support or open-bo0 i 
(left panel) and no memory-suppo i 
closed-book (right panel) conditions bi 
given by problem format in the top row 


by problem type in the middle row, pé 


problem content in the bottom row- 
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Ficure 2. Treatment X Posttest interaction by 
testing condition for three posttest dimensions in 
Experiment 1. (Problem format: F = formula, S = 
story; problem type: F = familiar, T = trans- 
formed, L = Luchins, Q = question, U = unanswer- 
able; problem content: C = combinations, J = 
joint probability, B = binomial probability.) 


book testing seems to have helped to slightly 
increase posttest scores overall for both 
Instructional groups, although the increase 
failed to reach a statistically significant 
level (F = 1.44, df = 1/96, p > .20). 

For both instructional groups, there was 
Some hint that memory support helped to 


649 


disproportionately increase performance on 
harder problems—far transfer items—espe- 
cially for S-format relative to F-format 
items. Marginally significant Type x Test- 
ing Condition (F = 2.12, df = 4/334, p < 
10), Format x Testing Condition (F = 
4.79, df = 1/96, p < .05), and Content x 
Testing Condition (F = 2.57, df = 1/192, 
p < .10) interactions offer only weak sup- 
port for this observation. 

An analysis of resilience of structural 
differences under varying testing conditions 
is made uncertain by the fact that there was 
no evidence that the memory-support ma- 
nipulation produced a. reliable effect. How- 
ever, a comparison of right and left panels 
in each row of Figure 2 indicates that the 
Treatment X Posttest interaction was pres- 
ent under both open-book and closed-book 
conditions, and this observation is consist- 
ent with a failure to obtain a reliable three- 
way interaction among instructional se- 
quence, type of posttest item, and testing 
condition (F < 1.00, df = 4/184, p > .20), 
among sequence, problem type, and testing 
condition (F = 2.79, df = 1/96, p > .10), 
or among sequence, problem content, and 
testing conditions (F < 1.00, df = 2/192, 
p» 20). 

Open-book testing seems to have very 
little effect on either the amount or the 
structure of problem-solving performance. 
The fact that the test followed learning 
almost immediately and that the binomial 
formula was on the chalkboard for all sub- 
jects may have reduced the impact of open- 
book testing. Apparently, subjects in the 
closed-book condition were able to generate 
problem solutions quite well with only their 
existing knowledge and the formula on the 
chalkboard. These results suggest that the 
structural differences obtained do not de- 
pend on subjects having to remember mate- 
rial. Apparently: failures in the test are due 
to failures to understand things rather than 
failures to retain information. 


EXPERIMENT 2 


Experiment 2 was intended to replicate 
and extend the results of Experiment 1. The 
main changes in Experiment 2 were that the 
ordering of instructional lessons was intro- 
duction, joint probability, combinations, 
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and binomial probability, and the testing 
condition variable was speed or time stress 
(limit of 90 seconds per problem) versus 
power or no time stress (no limit). In Ex- 
periment 2, no subjects received memory 
support during testing. 


Method 


As in Experiment 1, 9 subjects served in 
each cell of a 2 x 3 x 2 design with the 
factors being instructional sequence (Se- 
quence F and Sequence G), amount of in- 
struction (Amounts 1, 2, and 3), and testing 
condition (time stress and no time stress). 
The 108 subjects were recruited from a pool 
of University of Michigan students who had 
volunteered to participate in psychological 
experiments at the Human Performance 
Center for pay. 

Because of the change in ordering, sub- 
jects receiving Amount 1 studied the intro- 
duction section and joint probability sec- 
tion; subjects given Amount 2 had the 
introduction followed by joint probability 
and combinations, and subjects given 
Amount 3 received all four sections in the 
order: introduction, joint probability, com- 
binations, and binomial probability. 

The transfer test performance was scored 
and analyzed as in Experiment 1. As in 
Experiment 1, there was no reliable differ- 
ence in overall performance between the two 
treatment groups (F = 1.80, df = 1/96, 
p> .15). 


Results and Discussion 


Effect of Ordering 


A variable that was changed from Experi- 
ment 1 to Experiment 2 was the ordering of 
the parts or sections in the instructional 
booklets. For example, Amount 1 contained 
information about an introduction and com- 
binations in the ABC ordering of Experi- 
ment 1, but Amount 1 contained informa- 
tion about an introduction and joint 
probability in the BAC ordering of Experi- 
ment 2. Some indication of the effect of 
ordering and the respective importance of 
the combinations and joint probability sec- 
tions is provided by comparing performance 
across experiments, although interpretation 
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of these data is uncertain since the two ex- 
periments were not run concurrently nor 
under the same testing conditions. 

The overall proportion correct response 
with Amount 1, Amount 2, and Amount 3 | 
was .38, .40, and .46, respectively, for Ex- | 
periment 1 and .25, .34, and .43, respec- 
tively, in Experiment 2. In both experi- 
ments, a significant effect due to amount of 4 
instruction was obtained (Experiment 1: 
F = 4.47, df = 2/96, p < .025; Experiment 
2: F = 22.10, df = 2/96, p < .001). How- 
ever, a major difference is that subjects in | 
Experiment 1 showed relatively good per- | 
formance after Amount 1 and very little im- | 
provement from Amount 1 to Amount 2, 
while subjects in Experiment 2 showed rela- 
tively poor performance after Amount 1 and 
very much improvement from Amount 1 to | 
Amount 2. 

Apparently, the combinations section (in- | 
troduced in Amount 1 of Experiment 1 and 
Amount 2 of Experiment 2) was much more 
important in increasing performance than 
the joint probability section (introduced in 
Amount 2 of Experiment 1 and in Amount! 
of Experiment 2). Perhaps the concepts of 
combinations is less intuitive or less familiar 
then the concept of joint probability; but 
whatever the reason, the present study in- 
dieates that in teaching the concept 9 
binomial probability, the most important 
component to teach is the concept of com- 
binations. The ordering of presentation, 
however, apparently had little or no effect 
on final outcome. 


Development of Structural Differences 


Figure 3 shows the proportion correct 
response for the two instructional groups a 
posttest item (ie, Treatment x Posti 
interaction) for each of three amounts of y 
struction and for each of three posttes 
dimensions. As in Experiment 1, and a5 bs 
be seen in the right, panels of the three row! 
in Figure 3, the characteristic p tterns 0 


the Treatment x Posttest interaction M 
present, suggesting that the final leant 
outcomes of subjects in Sequence F 
Sequence G were structurally different. aid 

Again, special attention shoul pi 


Re 
to whether the Treatment x Posttest inte 
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PROBLEM CONTENT 


Ficure 3. Treatment X Posttest interaction by 
amount of instruction for three posttest dimensions 
in Experiment 2. (Problem format: F = formula, 
S = story; problem type: F = familiar, T = trans- 
formed, L = Luchins, Q = question, U = unanswer- 
able; problem content: C = combinations, J = 
joint probability, B = binomial probability.) 


action occurred at all points in learning (as 
found in Experiment 1) and whether both 
treatment groups performed at equal levels 
at each point in learning (as found in Ex- 
periment 1). 

The top row of Figure 3 shows the per- 
formance of the two instructional groups by 
Problem format at each of three points in 
learning, The two-way Sequence X Format 
Interaction—in which Sequence F subjects 
excelled on F-format and Sequence G sub- 
Jects performed better on S-format items— 
was reliable (F = 21.65, df = 1/96, p < 
001), again suggesting a structural differ- 
ie in what was learned by subjects in the 
Wo treatment groups. The Sequence X 
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Amount interaction (F — 3.30, df — 2/96, 
p < .05) was manifested in the fact that 
Sequence G outperformed Sequence F for 
both formats after Amount 1 but evened 
out to the usual Treatment x Posttest inter- 
action after Amounts 2 and 3. Unlike Ex- 
periment 1, apparently, lack of specific con- 
tent material was more detrimental for 
Sequence F subjects relative to Sequence G. 

In addition, the same pattern of Treat- 
ment X Posttest interaction was obtained 
at each of three points in learning, although 
the interaction was not disordinal after 
Amount 1 largely due to the poor per- 
formance of subjects in Sequence F. The 
general thrust of these results seems to be 
as follows: with Sequence F, there is so 
little learned in Amount 1 that subjects can 
be said to have little or no structure, and 
when structure begins to develop, it has the 
characteristic features that distinguish it 
from Sequence G. However, the failure to 
obtain a reliable three-way Sequence X 
Format x Amount interaction (F = 2.49, 
df = 2/96, p > .10) allows us to retain the 
hypothesis stated in Experiment 1 that 
structural differences began quite early and 
did not change much throughout learning. 

The middle row of Figure 3 shows the per- 
formance of the two instructional groups by 
problem type at each of three points in 
learning. Although Sequence F subjects per- 
formed better on F-type problems while 
Sequence G subjects performed better on 
Q- and U-type problems as was found in 
Experiment 1, the two-way Sequence X 
Type interaction failed to reach a statisti- 
cally significant level (F = 1.68, df = 
4/384, p > .15). An investigation of the 
marginally significant three-way interac- 
tions—Sequence x Type X Content (F = 
2.14, df = 8/768, p < .05) and Sequence X 
Type x Format (F = 2.40, df = 4/384, p < 
.05)—indicates that the Sequence x Type 
interaction was strongest for B content and 
F format. 

Since the Treatment x Posttest interac- 
tion was complicated by content and format 
and was not statistically significant in this 
case, there is little point in trying to locate 
where the Treatment x Posttest interaction 
began; however, there was no evidence that 
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the pattern of Treatment, X Posttest inter- 
action was influenced by the amount of 
instruction. The failure of the Sequence X 
Amount X Type interaction to reach statis- 
tical significance (F = 1.17, df = 3/384, 
p > .20) is consistent with the results 
of Experiment 1, indicating no structural 
change in how subjects in the two instruc- 
tional groups encoded material over the 
course of learning. 

. An interesting fact, however, as observed 
with respect to problem format, is that 
Sequence G was superior to Sequence F on 
all problem types after Amount 1, but that 
they tended to even out after Amounts 2 
and 3. A reliable Sequence X Amount inter- 
action (F = 3.30, df = 2/96, p < .05) veri- 
fies this observation and permits the claim 
that adding the combinations sections 
(Amount 2) helped Sequence F subjects 
more that Sequence G subjects. 

Finally, the performance of the two in- 
structional groups by problem content at 
each of the three points in learning is shown 
in the bottom row of Figure 3. As in Experi- 
ment 1, a reliable Treatment x Posttest 
interaction suggests structural differences 
were obtained with Sequence F subjects 
superior on B-content and Sequence G 
superior to C- and J-content items (F = 
13.09, df = 2/192, p < .001). 

Again, the pattern of the Sequence X 
Amount interaction was that Sequence G 
outperformed Sequence F for all content 
areas after Amount 1 but evened out after 
Amounts 2 and 3. Again, the same general 
pattern of Treatment X Posttest interaction 
seems to have been present, at least in some 
form, at each of three points in learning. 
However, unlike Experiment 1, a reliable 
Sequence X Amount X Content interaction 
(F = 2.62, df = 4/192, p < .05) indicated 
that the structural differences changed as 
the amount of instruction was increased. 
Apparently, the presentation order used in 
Experiment 2, in which the more necessary 
content was not presented until Amount 2. 
helped focus on the disproportionate im- 
portance of specific content for incomplete 
structures of Sequence F subjects relative 
to Sequence G subjects. 
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Resilience of Structural Differences 


Experiment 2 used the variable of time 
stress during testing (i.e. speed versus 
power) in another attempt to disrupt both 
the qualitative (or structural) and quanti- 
tative (or overall) features of the perform- 
ance of subjects who had acquired different 
cognitive structures. Figure 4 shows the pro- 
portion correct response for the two instrue- 
tional sequences by posttest item (ie, 
"Treatment x Posttest interaction) for each 
of two levels of time stress and for each of 
three posttest dimensions. The performance 
for the instructional groups under speed or 
time stress (right panel) and power or no 
time stress (left panel) conditions is given 
by problem format in the top row, by prob- 


lem type in the middle row, and by problem ' 


content in the bottom row. Forcing subjects 
to answer within 90 seconds produced an 
overall decrement in performance (F = 125, 
df = 1/96, p < .001) manifested in the ob- 
servation that the curves in the right panel 
are shifted down from those in the left 
panel. However, for each row, the same pat- 
tern of Treatment x Posttest interaction 
seems present both under time stress and no 
time stress. The failure to l 
Sequence x Format x Testing Condition 
(F < 1.00, df = 1/96, p > .20), Sequence x 
Type x Testing Condition (F < 1.00, df= 
4/348, p > .20), or Sequence X Content X 
Testing Condition (F « 1.00, df — 2/192, 
p > 20) interaction confirms the lack of 
evidence that the 90-second limit had any 
effect on the structure or quality of learning 
outcomes. 


SUPPLEMENTAL STUDIES 


In addition to the two main experim 
two smaller, supplemental studies Were d 
ducted. The supplemental studies used the 
same instructional materials and procedure 
as the main studies but varied the testing 
situation in order to provide more informi 
tion about the resilience and develop™ 
differences in learning outcomes. 


ents, 


Supplemental Study 1 


One study focused on retention perit 
ance in an attempt to test the endur 


obtain a reliable - 
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PROBLEM CONTENT 


Ficure 4, Treatment X Posttest interaction by 
testing condition for three posttest dimensions in 

xperiment 2, (Problem format; F = formula, 

= story; problem type: F = familiar, T = trans- 
formed, L = Luchins, Q = question, U = unanswer- 
able; problem content: © = combinations, J = 
joint probability, B = binomial probability.) 


ah time of the structural differences estab- 
ished by different instructional sequences. 


Method 


8 Nine subjects received the Sequence F, Amount 
KO rder ABC booklet and nine other subjects re- 
ived the Sequence G, Amount 3, Order ABC 
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booklet. The 30-item transfer test was given—with 
ng memory support and no time stress—two days 
ater. 


Results and Discussion 


In partial replication of earlier findings, 
Sequence F subjects reached a higher pro- 
portion of correct response than Sequence G 
subjects on near transfer items such as F 
type (.59 versus .41), T type (.39 versus 
.33), F format (.54 versus .44), and B con- 
tent (.44 versus .30), while Sequence G out- 
performed Sequence F on far transfer items 
such as U type (.44 versus .26), S format 
(.35 versus .30), C content (.44 versus .40), 
and J content (.43 versus .41). 

As in the main studies, an analysis of 
variance revealed no significant difference 
between the groups in overall performance 
and found Treatment X Posttest interaction 
for type (F — 3.58, df — 4/64, p « .025) 
and Treatment x Posttest interaction for 
format (F = 5.60, df = 1/16, p < .05) to be 
at statistically reliable levels. The Sequence 
x Content interaction failed to reach signif- 
icance (F = 2.09, df = 2/30, p > .10); 
however, this may be a reflection of the low 
number of subjects involved. A significant 
three-way Sequence x Type X Format in- 
teraction (F = 5.41, df = 4/64, p < .001) 
indicates that the Sequence X Type inter- 
action was much stronger for F-format 
items than for S-format items. As with the 
memory-support and time-stress variables, 
there was no strong evidence that the reten- ' 
tion interval altered the structure of learn- 
ing outcomes. 


Supplemental Study 2 


The other supplemental study used a 
modified “method of reproduction” to assess 
a subject’s judgment of what was supposed 
to be learned at each of three points in 
learning for the two instructional groups. 


Method 


Six subjects each were presented with either the 
Sequence F, Amount 1, Order ABC booklet; the 
Sequence F, Amount 2, Order ABC booklet; the 
Sequence F, Amount 3, Order ABC booklet; the 
Sequence G, Amount 1, Order ABC booklet; the 
Sequence G, Amount 2, Order ABC booklet; or the 
Sequence G, Amount 3, Order ABC booklet. All 
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TABLE 1 
AVERAGE NuMBER OF ELEMENTS IN REPRODUC- 
TION FoR Two INSTRUCTIONAL GROUPS AND 
Taree AMOUNTS OF ÍINSTRUCTION—SUP- 
PLEMENTAL STUDY 2 


Amount of instruction 
Instructional 
sequence 
1 2 | 30 | Average 
Average no. words 
F 110 152 243 168 
G 259 278 206 247 
Average 184 215 225 


Average no. symbols 


Average no. words and symbols 


F 205 339 433 326 
G 370 385 293 350 
Average 287 362 363 


Nole. Abbreviations: F — formula and G — 
general concepts. 


subjects were then given an immediate retention 
test and questionnaire. The test was to write down, 
with no memory support or time stress, what had 
been in the teaching program, as if the subject 
were trying to teach it to another subject coming 
in for the next session. The traditional method of 
reproduction was modified in that the subject was 
encouraged to “make sense” out of the material as 
if he were explaining what he had just learned 
to someone who did not already know it rather 
du reproduce the material word for word as 


Results and Discussion 


The protocols from each subject were 
coded for number of words and number of 
symbols, with each operator, each number 
and each formal notation character count- 
ing as a symbol (e.g., N! counts as 2 sym- 
bols; 15/2 counts as 3 symbols). Table 1 
shows the average number of words, the 
average number of symbols, and the aver- 
age number of both words and symbols 
given at each point in learning by subjects 
in the two instructional groups. Separate 
analyses of variance were performed on 
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three points in learning. The protocols show 
that subjects in Sequence F used signifi- 
cantly more symbols overall (F = 1297, 
df = 1/30, p < .005) and significantly fewer 
words overall (F = 9.25, df = 1/30, p < 
005) in explaining what was taught than 
Sequence G subjects. 

A more striking finding is that as the 
amount of instruction increased, the output 
of both number of words and number of 
symbols by subjects in Sequence F in- 
creased while the output by subjects in 
Sequence G stayed about the same or went 
down. Reliable Sequence x Amount inter- 
actions for words (F = 5.10, df = 2/80, 
p < .025), for symbols (F = 5.63, df = 
2/30, p < .01), and for both (F = 691, 
df = 2/30, p < .005) support this observa- 
tion. It appears that Sequence F subjects 
were adding more and more discrete pieces 
of information while Sequence G subject 
were forming a tighter, more streamlined 
structure, which was less dependent on 
material presented by the experimenter 
since more of the knowledge that was al- 
ready stored could be used. 


GENERAL DISCUSSION 


In previous attempts (Mayer 
1972) to describe the structural differences 
in what is learned by subjects in different 
instructional groups, it has been useful 
postulate two dimensions of cognitive strut- 
turing—internal connectedness and externa 
connectedness. Internal connectedness refers 
to the degree to which the variables of e 
formula are related to one another in t 
subject’s cognitive structure, for examp e, 
is related to R by exponentiation. Externa 
connectedness refers to the degree to W b 
variables of the formula are relate? | 
knowledge already existing in the subject 
cognitive structuring, for example, diy 
related to past experience with probabi ‘A 
such as weather forecasts (2076 chon a 
rain), dice (each number has Y6 PE A 
bility), batting average (.333 means 
batter gets a hit 1⁄4 of his times at bat). 


& Greeno, 


these data yielding some preliminary in- 
formation about how the subject stored the 
material and about his ability to detect the 
emphasis of the teaching booklet at each of 

* 


ACQUISITION PROCESSES FOR PROBLEM SOLVING PROCEDURES 


In the present situation, there is evidence 
that Sequence F subjects developed struc- 
tures with strong internal connectedness but 
weak external connections, while Sequence 
G subjects developed structures with strong 
external connectedness but weak internal 
connections. This hypothesis is consistent 
with the observed Treatment X Posttest in- 
teraction (with each of the three posttest 
dimensions) which generally showed that 
Sequence F subjects excelled on near trans- 
fer items which require an exact application 
of the formula and that Sequence G subjects 
excelled on far transfer items which require 
a more sophisticated understanding and 
interpretation of the component variables. 


Development of Structural Differences 


The present study adds some important 
new information on how these structures 
develop and helps provide a reasonable 
description of the acquisition process. 

P The main results with respect to this 
acquisition question” were as follows: 

1. The Treatment X Posttest interaction 
was consistently observed at all three points, 
that is, Amounts 1, 2, and 3, over the course 
of learning (Experiments 1 and 2). If learn- 
ing involved adding more and more of dif- 
ferent content material to memory, then the 
expected result would be a gradual emer- 
gence of Treatment x Posttest interaction. 
Instead, it seems than an assimilative set is 
evoked quite early in learning and that con- 
tent material is structured within the con- 
text of the set over the entire course of 
learning, 

2. Sequence G subjects performed better 
overall relative to Sequence F subjects when 
Specifically needed content material, that is, 
the combinations lessons, was lacking (Ex- 
periment 2). One interpretation is that the 
richer assimilative set of Sequence G sub- 
Jects, that is, the rich body of existing rele- 
vant knowledge, helped create original 
Problem solutions while the narrower as- 
Similative set of Sequence F subjects made 
Problem solving more dependent on specific 
content material. 

3. As the amount of instruction was in- 
eed Sequence F subjects produced 
Onger and longer reproduction protocols, 
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while the protocols of Sequence G subjects 
remained about the same or decreased (Sup- 
plemental Study 2). Again, these results 
seem most consistent with the idea that 
Sequence F subjects were adding material 
to fairly narrow assimilative sets while 
Sequence G subjects were integrating and 
streamlining material to fit within a rich 
bank of existing knowledge. 

The results of the present study seem 
consistent with the notion that different 
receptive sets are activated early in learn- 
ing. Furthermore, the differences in how the 
material is encoded are established early 
and remain constant throughout learning. 
Greeno (1972) has provided a framework 
for analyzing problem-solving behavior that 
relies on a distinction between two kinds of 
knowledge stored in semantic memory— 
“propositional knowledge” and “algorithmic 
knowledge.” Propositional knowledge refers 
to relational and conceptual information 
such as hierarchies of classes and subsets 
(collies are dogs), properties of classes of 
things (dogs have tails), and facts (April 
17-23 is National Dog Week). Algorithmic 
knowledge refers to rules or operations such 
as the procedures followed in doing long 
division. 

In the present example, it seems that the 
organization and emphasis of Sequence F 
encourages the use of a narrow, assimilative 
set concerned with mathematical operations 
and calculations—what could be called 
algorithmic knowledge—and that Sequence 
G encourages the activation of a broader, 
more integrative set made up of the sub- 
ject’s general experience—what could be 
called propositional knowledge. Since what 
is learned is the product of both the pre- 
sented material and the assimilative set the 
subject uses to encode it, different learning 
outcomes are possible. What these results 
seem to show is that the algorithmic kind of 
set is activated quite early in learning for 
Sequence F and that the propositional kind 
of set is activated quite early in learning for 
Sequence G; further, there is support for the 
idea that the structural differences observed 
at the end of learning have their roots in 
this embedding of material into two dif- 
ferent kinds of assimilative sets or of knowl- 
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edge which begins early and continues 
throughout learning. 


Resilience of Structural Differences 


The main results with respect to the 
“resilience question" were as follows: (a) 
Memory-support or open-book testing had 
a mildly positive overall effect on problem- 
solving performance, but there was no evi- 
dence of any change in the pattern of Treat- 
ment X Posttest interaction (Experiment 
1). (b) Time stress, the speed condition, had 
a strong overall inhibitory effect on prob- 
lem-solving performance, but there was, 
again, no evidence of any effect on the pat- 
tern of Treatment X Posttest interaction 
(Experiment 2). (c) A two-day retention 
interval also failed to produce disruptions of 
the usual patterns of Treatment x Posttest 
interaction (Supplemental Study 1). Thus, 
all attempts to alter the structure or quality 
of problem-solving performance failed. Ap- 
parently, once the problem-solving rule is 
encoded, testing conditions can influence the 
absolute level of performance but with negli- 
gible effects on the structure of problem- 
solving performance. 


Relation to Discovery Learning and 
Creative Problem Solving 


These results also give some hints about 
the prerequisites for discovery learning and 
creative problem solving. For the kind of 
teaching whose goal is creative problem 
solving, such as displayed by Sequence G, 
Amount 1 subjects in Experiment 2 (i.e., 
inventing solutions with very little specific 
content material having been given), it is 
clear that a substantial bank of what has 
been called “propositional knowledge” must 
be available. For subjects who do not have 
a well integrated set of general experiences 
in the required area (e.g. in this case, in 
probability of events), it seems that at- 
tempts to achieve learning outcomes that 
can support creative problem solving will 
fail. This is so because the kind of learning 
outcome that supports creative problem 
solving (e.g. as displayed by Sequence G 
subjects) is acquired by embedding the 
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problem-solving rule in a bank of the ap- 
propriate propositional knowledge. Instead 
of demanding that subjects who lack the 
appropriate prerequisite knowledge learn a 
given problem solution rule, thereby insur- 
ing either no learning or a very specific kind 
of encoding, it seems a better educational 
practice to first provide subjects with the 
necessary prerequisite concepts. Previous 
findings (Egan & Greeno, 1973) that show 
individual differences in specific prerequisite 
knowledge to be far more important for 
“discovery” learning methods than for 
“rule” learning methods support this argu- 
ment. 

In short, the results seem to indicate that 
for those kinds of learning whose goal is a 
quick, efficient ability to perform a given set 
of operations (e.g., arithmetic operations, 
ete.), only a narrow set of existing knowl- 
edge is necessary ; but for learning that sup- 
ports creative problem solving (e.g., recon- 
structing a procedure for new problems), the 
need to make sure the subject possesses the 
prerequisite knowledge is essential. These 
results suggest that discovery teaching to 
subjects who lack the appropriate propos 
tional knowledge will not result in discovery 
learning or creative problem solving al 
may result in no learning at all. 
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The mental age (MA) and IQ of thr 
dren varying in chronological age (C. 


ee samples of nursery school chil- 
'A) were ascertained at two points 


in the school year. The testing points were chosen to conform to a 
design which permitted the non-CA-related influences of schooling to 
be estimated. The results revealed that both CA and amount of time 


in school were positively related to 
IQ and length of schooling was also 
results were discussed in terms of 
fluences of schooling on intellectual 


MA. A similar relation between 
found. The implications of these 
both the direct and indirect in- 
functioning and in terms of po- 


tential sampling biases inherent in the norming procedures for avail- 


able standardized tests. 


_ One of the objectives of developmental 
investigations is the identification of the 
components which contribute to age-related 
changes in performance. Schaie (1965), for 
example, has suggested that individuals can 
be described according to three such compo- 
nents: chronological age (CA), cohort 
(birth date), and time of measurement. 
Furthermore, he suggested that perform- 
ance differences between populations vary- 
Ing in chronological age represent the com- 
posite effects of these three components 
rather than of age changes alone and that 
conventional research designs used in devel- 
Opmental research confound these compo- 
nents, 

For example, age differences found using 
cross-sectional research designs (where 
samples of subjects varying in chronologi- 
Cal age are tested at the same point in 
time) may represent age changes, cohort 
changes, or their joint influences. Similarly, 
Age differences in performance resulting 
from the use of a longitudinal research de- 
Sign (where subjects from the same cohort 
are tested at different points in time) may 
be attributed to age changes, to time of 
E————— 


* Requests for reprints should be sent to L. R. 
pulet, Department of Educational Psychology, 
versity of Illinois, Urbana, Illinois 61801. 
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measurement differences, or to their joint 
influences. Such inferences point to the lim- 
ited usefulness of using simple longitudinal 
or cross-sectional designs in descriptive de- 
velopmental research and also suggest a 
possible explanation for the disparities in 
the nature of developmental functions ob- 
tained using these data collection strategies. 
The present study had three purposes: 
(a) to compare longitudinal and cross-sec- 
tional age gradients in intellectual function- 
ing in young children; (b) to identify age- 
related and time-related (specifically nurs- 
ery school experiences) changes on such in- 
tellectual functioning; and (c) to examine 
the stability of the estimate of IQ over pe- 
riods of time when the population of inter- 
est was enrolled in nursery school. 
Available data for school-age children 
suggest that longitudinal age gradients in 
intellectual functioning provide estimates of 
change which exceed the estimates provided 
by cross-sectional age gradients (e.g, 
Baltes & Reinert, 1969; Schaie, 1972). Such 
a disparity may be expected, since longitu- 
dinal measurement, by definition, provides 
estimates of change which confound age- 
related and school-related influences on per- 
formance. In other words, over the period of 
a school year, the child is not only profiting 
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from the influences of school, he is "aging" 
or developing. In contrast, within-grade 
cross-sectional comparisons do not incorpo- 
rate such confounding, since performance is 
assessed for all samples at the same point in 
time (thereby controlling for the amount of 
time spent in school). The present data pro- 
vide evidence regarding the generalizability 
of this phenomenon for nursery-school-age 
children. 
Correlated with the first purpose is the 
present concern with identifying the vari- 
ance associated with age and school experi- 
ence during the period of time that the child 
is enrolled in nursery school. That is, even 
though age and amount of schooling are 
inextricably correlated for individual chil- 
dren, it is possible to obtain independent 
estimates of the influences of schooling. 
This is possible since children within a 
grade vary in the age of school entry. Thus, 
it is possible to compare the performance of 
a sample of subjects at a point early in the 
school year with that of a matched CA 
sample tested at a later point in the school 
year. The available data permitting such 
comparisons suggest that the influences of 
schooling equal or exceed those related to 
CA for samples enrolled in the first grade 
(Schaie, 1972). Similar conclusions are pos- 
sible at least through the fourth grade 
(Baltes, Baltes, & Reinert, 1970; Baltes & 
Reinert, 1969). Again, however, no data are 
available for younger children. 
Finally, the present study provides data 
concerning the sensitivity of the IQ meas- 
ure to the influences of schooling. The Pea- 
body Picture Vocabulary Test (Dunn 
1965), which was used for present purposes, 
was standardized using cross-sectional sam- 
pling procedures during the months of 
April-June. None of the children younger 
than six years old in the normative sample 
were enrolled in school. The present testing 
took place at two points in the school year 
over which three samples of children vary- 
ing in CA were each tested twice. Thus, at 
the second testing period, the children were 
both older and had more school experience. 
It follows, therefore, that the IQ scores for 
the second time period should exceed those 
obtained for the first time period even 
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though the transformation from mental age 
(MA) to IQ controls for influences related 
to CA. 


METHOD 


Design 


The experimental design was a 3 X 2 (Chrono- 
logical Age X Time of Testing) factorial. All sub- 
jects were tested twice, once in the fall of the 
school year (October-November) and once in the 
spring (March-April). All subjects were adminis- | 
tered Form A of the Peabody Picture Vocabulary 
Test in the fall, whereas half of the subjects re- 
ceived Form A and the remainder received Form 
B at the spring testing. Approximately equal num- 
bers of boys and girls were included in each CA 
group. 


Subjects and Procedure 


The subjects were 63 preschool children (31 
boys, 32 girls) enrolled in one of three preschools 
in Champaign-Urbana, Illinois. The subjects were 
randomly chosen from those available within the 
three age groups. The mean CA for the young, 
middle, and old groups at the two times of testing 
are contained in Table 1. Í 

The standard procedures for administration of 
the Peabody Picture Vocabulary Test were fol- 
lowed with the exception that testing began at 
Item 1 of the test. The MA and IQ scores were 
taken from tables available in the test mant! 
(Dunn, 1965). Testing was conducted in a test 
trailer or private room at each of the nursery 
schools. Retesting in the spring occurred in the ap- 
proximate order of testing used in the fall. This 
ensured that the mean age of each group increase 
by five months and that the distribution of age 
within groups remained stable from fall to spring: 


RESULTS 


Longitudinal and Cross-Sectional Age 
Gradients 


The present data provide s 
cross-sectional comparisons, that 15, one 
each for the sets of data from each testing 
period. The data from the first testing h^ 
riod are most appropriate for present m. 
poses and are presented in Figure 1 E 
with the longitudinal gradient for the oe 
age range estimated from the present b 
The longitudinal gradient was construc ; 
by assessing the change in MA from Ch 
first to the second testing period for the 


X- 


two sets of 


4 


range from 46 months to 52 months hr 
similarly for the CA range from E. m 


months, and adding these changes j 


LONGITUDINAL CHANGES IN INTELLECTUAL FUNCTIONING 


N 
uw 


N 
o 


Pun 


Mean MA (Months) 
o o 
© o 


O— LONGITUDINAL 
@—CROSS— SECTIONAL 


n 
w 


46 


Mean CA 

(Months) 
Ficure 1. Cross-sectional and longitudinal age 
gradients for nursery school children. (Abbrevi- 


Bene: CA = chronological age and MA = mental 
age. 


known base for the youngest sample 


(Schaie & Strother, 1968). As is apparent 


! 


, 


from Figure 1 , the longitudinal age gradient 
Provides estimates of age-related change 
Whieh exceed those of the cross-sectional 
age gradient and suggests the existence of 
facilitative effects related to schooling. In 
addition, both gradients reveal a positive 
relationship between CA and performance. 

Table 1 provides summary data related 
to the chronological age, mental age, and 
IQ for the samples of subjects at the two 
time periods, Preliminary statistical analy- 
Ses for each of the measures revealed no 
ifferences in performance attributable 
cither to sex of the subject or to the form of 
the test which was administered to the sub- 
Jects, Therefore, the data were pooled. The 
Sets of data relating to MA and IQ were 
fach subjected to a 3 x 2 (Chronological 
E X Time of Testing) analysis of vari- 
Es Time of testing was a repeated meas- 
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The analysis for the data reflecting 
changes in MA revealed statistical signifi- 
cance both for the main effects of CA (F — 
6.25, df — 2/60, p « .003) and time of 
testing (F = 66.64, df = 1/60, p < .001). It 
is apparent, however, that the influences of 
time of testing represent sources of variance 
associated both with age change and with 
factors associated with school experience. 
Fortunately, the design permits an evalua- 
tion of performance differences between the 
two times of testing for matched CA sam- 
ples. Two such contrasts are possible, that 
is, at CA of 52 months and CA of 57 
months. The pooled within-cell error vari- 
ance (Winer, 1962) was used as the error 
estimate for the contrasts. Only the con- 
trast for the older samples reached statisti- 
cal significance (p < .05), even though both 
comparisons yielded data in the predicted 
direction. 

The final statistical analysis involved a 
comparison of IQ scores. The longitudinal 
contrasts provide an alternate way of eval- 
uating the influences of school experience in 
that the age-related changes are controlled 
when the mental age scores are transformed 
to IQ. The results of this analysis revealed 
statistical significance for the main effect of 
time of testing (F = 21.98, df = 1/60, p < 
.001). The absence of statistical significance 
for the main effect of CA and the interac- 
tion (both Fs < 1) suggested equivalence 
of the samples in measured IQ, and statisti- 
eal contrasts between matched CA samples 
again revealed statistical significance for 
the comparison involving the older samples 


TABLE 1 


Summary Data ror YouNG, MippLE, AND OLD 
SAMPLES AT Two Times or TESTING 


Measures 
Sample CA MA 1Q 
T: T T T: Tı T: 
Young 46.5 | 51.8 | 58.1 | 69.3 | 110.1| 115.2 
Middle | 52.2 | 57.1 | 66.9 | 78.1 | 111.7| 117.8 
Old 56.8 | 62.2 | 69.3 | 83.4 | 110.6| 118.9 


Note. Abbreviations: CA = chronological age, 
MA = mental age, and T = time. 
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(p « .05) but not for the younger ones. 
Such a conclusion is also suggested from the 
longitudinal differences which reflect a 
mean IQ difference of 5.1 points between 
times of testing for the youngest sample, 
with this difference increasing to 8.3 for the 
oldest sample (see Table 1). 


Discussion 


The present results suggesting that 
schooling has effects on performance inde- 
pendent of those related to CA generally 
confirm the findings of studies (Baltes et 
al., 1970; Baltes & Reinert, 1969; Schaie, 
1972) with older children. In fact, the ex- 
amination of the MA scores for the three 
samples reveals that the longitudinal 
changes over the two testing periods exceed 
by 100% or more the relatively substantial 
changes which are attributable to CA alone 
(Dunn, 1965). Again, these results confirm 
the findings of others (e.g., Schaie, 1972). 

It is perhaps most important to discuss 
the implications of the present findings for 
educational and developmental research 
and measurement, and in the assessment of 
the effects of educational experiences. 

The primary issue to be considered here 
concerns the assessment of the effects of 
educational intervention (used in the broad 
sense) on performance over the period of a 
school year. In this context, a severe and 
limiting source of confounding is poten- 
tially present; that is, the influences of 
schooling are not discernible from other 
CA-related influences on performances. 
That is not to say that the impact of or 
effects of exposure to the school curriculum 
can be considered to be independent of be- 
havioral development. Rather, school learn- 
ing must be considered to be one of the 
components in the developmental process. 
If for no other reason than this, the effects 
of school experience must be estimated. 

The most obvious way in which to pro- 
vide estimates of the effects of experience 
on performance unbiased by behavioral de- 
velopment is to simulate or “accelerate” the 
process through the provision of massed 
training or practice. Such an experimental 
strategy is used very often in contemporary 
studies concerned with cognitive develop- 
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ment (e.g., Gelman, 1969; Sigel & Hooper, 
1968). However, such approaches cannot be | 
generalized directly to school situations for 
a number of reasons: (a) It is not possible 
either to identify the range of experiences 
acquired in or as a direct result of the inter- 
action in school; nor is it possible to simu- 
late them in their entirety in controlled or 
laboratory situations. (b) Behavioral 
change induced through massed practice 
must, of necessity, be limited in scope. Also, 
attempts to generalize the findings to school 
situations are severely limited because of 
the possibility of an interaction between | 
time and the acquisition of the behavioral | 
phenomena of interest. In other words, the 
product of school experiences are acquired 
over a long period and through a variety of 
media, including the teacher, age-mates, 
and nonschool situations prompted by 
school curriculum. There is no reason to 
expect that the effects of massed practice 
on specified tasks have effects which are 
isomorphic with those which are acquired 
as a result of schooling over the school year. 

There is a second way to provide direct 
estimates of the effects of school experience 
which are unbiased by independent time 
or age-related components of behavioral 
change. In the most simple case, the proce 
dure would involve the comparison of two 
groups of children across time (es the 
school year) under conditions in which bo 
groups were eligible for acceptance into 
school but one of the two groups was €n 
rolled in school and one was not. However 
it is extremely difficult to find a sample 0 
children who are of school age but who 
have not been enrolled in school. And, e 
if such a sample were available in the ur 
eral population, it would be impossible 1 
match them with children who were 97 
rolled. The very conditions which prem 
tated the lack of enrollment would bias 
sample. As is apparent, the present e 
utilized a research strategy which b ; 
izes on the latter method while avoiding is 
potential sources of confounding when? 
used. rot 

The present data also relate to P Hy 
dures of test development, standardizs j- 
and norming. There are two primary 
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cations of the present data. First, it has 
already been mentioned that the norms for 
the Peabody Picture Vocabulary Test were 
developed according to cross-sectional sam- 
pling- procedures. This method implicitly 
holds amount of schooling constant for sub- 
jects within a grade or, of course, for sub- 
jects who are not as yet enrolled in school 
(as was true for the normative sample of 
children below the age of six years). It is to 
be expected, then, that the IQ estimate 
would increase if the subjects were followed 
longitudinally over a period when the sub- 
jects were exposed to school or to other en- 
vironmental conditions which influence in- 
tellectual achievement. This would be espe- 
cially true for a test such as the Peabody 
Pieture Vocabulary Test which measures 
"verbal" intelligence. However, for within- 
grade comparisons, the degree of bias would 
not be expected to vary with the CA of the 
sample. 

Second, it is important to note that the 
sampling procedure used in norming the 
Peabody Picture Vocabulary Test enter- 
tains a quite different type of bias for pre- 
school children than for populations of chil- 
dren enrolled in school. Again it must be 
noted that the procedure used for norming 
the Peabody Picture Vocabulary Test in- 
volved a decidedly biased sample. That is, 
all children above the ages of six and a half 
years in the normative sample were enrolled 
in school and, because of the testing period 
(April-June), had been enrolled for most or 
all of a complete school year. Unfortu- 
nately, birth dates and birth rates in the 
Population do not distribute themselves in 
this manner but rather are distributed 
throughout the calendar year. This means, 
of course, that the test scores (MA or IQ) 
for school-age children who are tested in 
the months from July to March would be 
Compared to that of a normative sample 
which enjoyed from seven months to a min- 
imum of one month more schooling, re- 
Spectively. Such contrasts suffer from a 
Negative bias which would decrease in mag- 
nitude as the data of testing approaches 
that for the normative sample. Such a bias 
May also exist for other tests, but the de- 
Stee and direction of bias would be ex- 
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pected to vary with the type of sampling 
procedure used in the test standardization 
and norming, in addition to the nature of 
the relationship between amount of school 
experience and performance. 

The present results also suggest the util- 
ity of norming tests using samples selected 
and tested at different times within a year. 
This suggestion has had its proponents 
(e.g., Baltes & Reinert, 1969; Lodge, 1938) 
but has never been formally implemented. 

Some final comments concerning the in- 
fluences of schooling are warranted. First, 
there is no intent to imply that the results 
attributed to the influences of school experi- 
ence in the present study are directly or 
exclusively attributable to the “in-class- 
room” experiences of the children. Rather, 
such influences may take many forms, 
ranging from the effects of the different 
forms of social interactions, environmental 
contexts, and parental or peer demands 
which confront the children while they are 
enrolled in school. It should also be empha- 
sized that the character of extraschool ex- 
periences may vary considerably for chil- 
dren who are and who are not enrolled in 
school during the same CA period. In the 
present study, for example, the three CA 
samples entered school at different CAs 
and, therefore, spent varying amounts of 
time at home. The influences of such factors 
cannot be discounted in the present results. 
Last, the possibility exists that schooling 
may have indirect effects on performance 
even before the child enters school. As an 
example, parents may react to the child’s 
impending entry into school by providing 
certain forms of educational experiences 
(e.g, reading, counting, etc.) at home to 
“prepare” the child for school. There is evi- 
dence to support such a contention (Hay & 
Goulet, 1973), although such factors were 
not of interest in the present study. 
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DOLL PREFERENCES: 
AN INDEX OF RACIAL ATTITUDES? 


PHYLLIS A. KATZ? ann SUE ROSENBERG ZALK 
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A doll choice task was administered to 192 nursery and kindergarten 
children by same- and other-race examiners. Half of the children were 
white and half were black. In contrast to earlier studies, male and 
female dolls were presented which differed only in skin color, not hair 
or eye color. The strong preference for white dolls found by previous 
investigators was not obtained. Young children exhibited a slight pref- 
erence for other-race dolls, although gender cues were more significant 
determinants of choice behavior than were skin color cues. Children’s 
responses were, in part, a function of the tester’s race. Stronger pref- 
erences for same-race dolls were exhibited in the presence of a same- 


race examiner. 


The acquisition of racial attitudes in 
young children has frequently been studied. 
Contrary to the opinion of the lay public, 
the most consistent finding has been that 
preschool children are very much aware of 
racial cues, By the age of four, most children 
can correctly identify skin color. Moreover, 
there appears to be a considerable degree 
of evaluative content associated with racial 
cues (e.g, Asher & Allen, 1969; Clark & 
Clark, 1947; Greenwald & Oppenheim, 
1968; Porter, 1971). 

The assessment of young children’s racial 
attitudes typically utilizes a task involving 
doll choices. The child is presented with 
several dolls and is asked to choose the doll 
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that goes with either a positive or negative 
characteristic (i.e., which is the good doll). 
Clark and Clark (1947) originally demon- 
strated that preschool children (both black 
and white) tend to associate positive attri- 
butes to the white doll and negative ones 
to the black one. This finding has been 
replicated many times. Both the consistency 
of these results and the seemingly straight- 
forward nature of the doll task undoubtedly 
accounts for its continued popularity as an 
index of young children’s racial attitudes 
and perceptions. 

Despite its general acceptance, however, 
the typical administration of a doll prefer- 
ence task raises a number of interpretive 
and methodological problems. The first one 
concerns the psychometric characteristics of 
the instrument itself. Neither reliability nor 
validity data has ever been presented. It 
has never been shown that children’s pre- 
ferences are consistent over time nor that 
a child's response on a doll task is related 
to future attitudes. Moreover, the only 
study attempting to relate doll preferences 
to intergroup behavior (friendship choices 
in nursery school) found no relationship 
(Hraba & Grant, 1971). Thus, the validity 
of this measure remains to be demonstrated. 
The assumption made by previous investi- 
gators that a prejudiced child will remain 
so clearly needs substantiation. 
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A second problem has to do with inter- 
preting the relative importance of racial 
cues in the child's judgment. The impres- 
sion earlier studies impart is that skin color 
cues are very significant indicators of a 
young child’s perception of people. A re- 
current methodological difficulty, however, 
has been the consistent confounding of skin 
color with other cues. White dolls typically 
have blue eyes and blonde hair, whereas 
black dolls have brown eyes and brown hair. 
The possibility exists, therefore, that chil- 
dren may have based their choices on cues 
other than skin color. An additional diffi- 
culty is that only female dolls are ordinarily 
displayed, thus eliciting preferences that 
may not be generalizable to males. More- 
over, when a choice is limited to only race 
cues, the strength of those cues relative to 
other person characteristics (such as gender) 
may be seriously overestimated. 

A final problem with most earlier studies 
concerns the type of examiner employed. 
Recent evidence suggests that the race of 
the examiner may be an important variable 
influencing children’s performance in a wide 
variety of contexts (Bucky & Banta, 1972; 
Katz, 1973; Katz, Johnson, & Parker, 1970; 
Sattler, 1970). Nevertheless, most earlier 
studies with preschool children have not 
varied the race of the examiner. 

The present study attempted to clarify 
some of these methodological ambiguities 
in children’s doll preference behavior. In 
view of the previously raised considerations, 
dolls were employed which varied only in 
skin color, with hair and eye color constant. 
Gender cues were provided in order to 
assess the relative strength of racial and 
sexual preference. An inquiry procedure was 
included in which reasons for the child’s 
choice were elicited. The purpose of this 
latter task was to ascertain whether chil- 
dren’s verbal behavior was related to doll 
choices. Finally, half of the subjects in each 
group were tested by a black and half by 
a white tester in order to explore the effects 
of race of examiner. 


MetHop 


Subjects 


f A sample of 192 subjects was employed, equally. 
divided according to age, race, and sex. The mean 


PHYLLIS A. KATZ AND SUE ROSENBERG ZALK 


chronological ages of the younger and older groups 
were 3 years 11 months and 5 years 2 months, 
respectively, with corresponding standard devia- 
tions of 5 and 6 months. 

Subjects were drawn from several sources. Older. 
children were randomly selected from kindergarten 
classes of an integrated public elementary school 
located in a lower to lower-middle income area of 
New York City. The younger children were 
selected from two nursery schools in the same 
geographical area, At the first nursery school (a 
church-associated one), 75% of the students were 
white and 25% were black. The second nursery 
sample was drawn from a city-run nursery school 
which was predominantly black and which was 
located in a neighborhood that was of a somewhat 
lower socioeconomic level. Analyses of the re: 
sponses of the black children, however, revealed 
no differences between the two nursery settings 
so that they were pooled in subsequent analyses. 


Doll Preference Task 


Two black and two white dolls, one of each sex; 
were introduced to the children. The dolls were 
approximately 10 inches tall, made of rubber, and 
were all cut from the same mold. They all had 
brown eyes. The same dark brown wigs were used 
on all dolls. The hair was straight and coarse 
textured and was judged by adults (both black 
and white) to be appropriate for either race. Faci 
features were constant across skin color, and noses 
and mouths were judged by adults to be ambiguous 
enough to be either European or African toddlers. 
The male dolls had short hair and were dressed in 


a white shirt and red pants; the female dolls had 
longer hair and wore white shirts and red 


The children were asked to choose (a) the doll they 
like best; (b) the doll they didn’t like as much 2$ 
the others; (c) the good doll; (d) the bad doll; (e) 
the doll that was a nice color; (f) the doll that Wi 
not a nice color; and (g) which doll they woul 
prefer to take home. y re 
In addition, three identification questions M 
employed in which the child was asked to point 
(a) the doll that looked most like him or her} 


questions, an inquiry procedure was en 
which the child was encouraged to giy ood 
for his choice, for example, Why is that one & pon 
doll? The frequencies of doll choices made to jdi- 
question were tabulated for each group: terende 
tion, a composite race preference and sex prele! ‘tive 
score was obtained for each subject for the ratio? 
and negative items. An accuracy of identi 

score was also obtained. Finally, verbal her 
were categorized according to the attributes 

taneously mentioned in the inquiry portion. 


General Procedure 


A oom 

Each subject was individually tested in 57 gd 
at their respective school. Within eac ted bY, : 
subjects were tesiei fhe 


racial group, half the e. 


white examiner and half by a black 
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examiners were four females, in their middle 
twenties, who had had previous experience working 
with young children. 


RESULTS 


There are a number of ways of analyz- 
ing the choices that children made with 
regard to the four doll choices. In earlier 
studies, frequencies to each question have 
been tabulated and nonparametric tests 
have been conducted. It was felt that a more 
efficient way of handling the data in the 
present study was to combine responses to 
the various questions and to assign each 
child a racial and sexual preference score. 
The racial preference score was determined 
by the number of times that the child chose 
a same-race doll for positive items and an 
other-race doll for negative items. Similarly, 
a sexual preference score for each child was 
determined on the basis of the number of 
times the child chose a same-sexed doll for 
positive characteristics and other-sexed doll 
for negative characteristics. 


Racial Preference 


The average racial preference scores for 
the positive and negative items are con- 
tained in Table 1. Since there was an un- 


. equal number of positive and negative items, 


each average score was further divided by 
the number of items within each subtest 
in order to make them comparable. Thus, 
the possible range was from 0 to 1. Lower 
Scores indicate choices of other-race dolls 
for positive items and same-race dolls for 
hegative attributes. High scores indicate the 
Opposite pattern, that is, same-race choice 
for positive items and other-race choice for 
negative items. A score of .5 indicates a 
chance distribution of choices. It can be 
Seen in Table 1 that, for the most part, the 
children’s scores did not differ much from 
chance expectancy. This trend was in con- 
trast to earlier results obtained by Clark 
and Clark (1947) and others, which showed 
that both black and white preschool chil- 
dren have strong preferences for white dolls. 
f the results presented in Table 1 con- 
Ormed to earlier results, the white children 
should have averaged about .7, whereas the 
lack subjects should have received scores 
of approximately .3 (indicating other-race 
choices), 
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TABLE 1 
AVERAGE RACIAL PREFERENCE SCORES ON 
Doru CnHorck Task 


‘Type of item 
Age/race of subject Race of, 
Positive {Negative eee 
Nursery 
White same 44 52 48 
other 38 40 .39 
Black same .29 44 .36 
other .94 .45 .40 
Kindergarten 
White same .63 .05 .64 
other .40 .96 .38 
Black same .55 .52 54 
other E .50 AT 


A four-way analysis of variance con- 
ducted on the racial preference scores (Age 
x Race of Examiner X Race of Subject x 
Type of Item) revealed the following effects 
to be significant: Age (F = 3.90, df = 
1/184); Type of Item (F = 4.58, df = 
1/184) ; Race of Examiner x Race of Sub- 
ject (F = 3.92, df = 1/184); and Age x 
Type of Item (F = 4.18, df = 1/184). The 
age effect indicates that nursery school chil- 
dren expressed less prejudice than kinder- 
garten children, a mean of .41 as compared 
to .51. Actually, the overall average for the 
older children, as noted above, did not differ 
from chance expectation, and the lower 
score obtained by the younger children in- 
dieates a slight preference for other-race 
dolls. Doll choices are influenced by whether 
children are being tested by a same- or 
cross-race examiner. The significant Race of 
Examiner X Race of Subject interaction 
reflects the finding that children are more 
prone to select, dolls of their own skin color 
for positive items and other-race dolls for 
negative attributes when tested by an ex- 
aminer of the same race. This trend appears 
more pronounced in white than in black 
children. The performance of the white 
kindergarten children is partieularly in- 
teresting since they show a preference for 
white dolls with the white examiner but a 
preference for black dolls with the black 
examiner. The type of item effect reflects 
the finding that children had a greater 
tendency to express attitudes of greater 
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TABLE 2 
AVERAGE SEX PREFERENCE Scores ON Dorn 
Cuorce Task 


Type of item 
Age/sex of subject je 
Positive | Negative | Combined 
Nursery 
Male .48 .50 49 
Female .64 .50 57 
Kindergarten 
Male .90 48 49 
Female -79 58 .68 


tolerance with regard to positive rather than 
negative attributes. This tendency was more 
pronounced for the nursery group, as in- 
dicated by the Age X Type of Item inter- 
action. 


Sexual Preference 


Since the children could choose the dolls 
on the basis of either skin color or gender 
cues, an analysis of variance was also con- 
dueted on the sex preference scores of the 
children. These scores are contained in 
Table 2. 

Sex preference scores were obtained in the 
same way as the race preference scores. A 
mean of .50 indicates no preference, a higher 
Score indicates preference for same-sexed 
dolls on positive items and other-sexed dolls 
on negative ones, whereas a lower score 
reveals the reverse pattern. 

The analysis revealed that the main ef- 
fects of sex of subject (F = 12.03, df = 
1/176), type of item (F = 10.68, df = 
1/176), and the Sex of Subject x Type of 
Item interaction (F = 9.40, df = 1/176) to 
be significant. As can be seen in Table 2, the 
significant results indicate that girls had 
much stronger preferences for same-sexed 
dolls (M = .63) than did boys (M = .49), 
particularly with regard to positive attri- 
butes. None of the other main or interaction 
effects were significant in the analysis. 

A comparison of Tables 1 and 2 is in- 
teresting in that it indicates that, at least 
for girls, gender appears to be a much 
stronger cue for doll choice than skin color. 
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Racial Identification 


The average racial identification scores 
are presented in Table 3. Since there were 
three racial identification questions, the 
possible range of scores on this measure was 
from 0 to 3, with higher scores associated 
with greater accuracy. As can be seen in 
Table 3, most of the children had very high 
scores on this measure. An analysis of vari- 
ance revealed a significant age effect (F = 
11.62, df = 1/176) with nursery subjects 
obtaining a mean of 2.43 as contrasted with 
2.73 for the older children. In addition, the 
interaction of Age X Race of Subject x Sex 
was significant at the .05 level (F = 6.51, 
df = 1/176). As can be seen in Table 3, this 
interaction appears to be accounted for 
primarily by the relatively low scores of the 
black male nursery school children. Interest- 
ingly, this is the same group which shows 4 
stronger tendency to select white dolls for 
positive items. This accounts for the only 
significant intermeasure correlation ob- 
tained, namely, a correlation of .27 between 
racial identification and racial prejudice 
scores. 


Relation of Verbal and Choice Behavior 


It should be recalled that following the | 
doll choice tasks, subjects were explicitly 
asked to give the reasons for their choice. 
Perhaps the most interesting finding with 
regard to how children describe their do 
choices is that they say remarkably little 
about color. Each of the 192 subjects Mo 
asked seven questions about the dolls. o 
of a total of 1,344 possible responses, on 


TABLE 3 a 
AvERAGE RACIAL IDENTIFICATION ScoRE 
Age 
Sex/race of subject Dod rr 
ce of subj Med EE. 
Male à 
White 2.07 271 
Black 2.00 
Female 
White 2.46 A 
Black k 


| 


] 


INDEX OF RACIAL ATTITUDES 


66 (5%) were made on the basis of color or 
other racial characteristics. 

It may be that the children were simply 
inhibited about stating such reasons for 
choices. If this were so, however, it would 
be expected that more color descriptions 
would emerge with a same-race examiner. 
No differences, however, were attributable 
to race of examiner. Moreover, if inhibition 
was involved, the older children should give 
fewer color responses. Actually, they gave 
slightly more (again, a nonsignificant dif- 
ference). More typical of the verbal re- 
sponses given by children were as follows: 
His hair is too short; She bothers him too 
much; He shouts too much, etc., which 
referred to either gender cues, clothing, or 
imagined personality characteristics. Thus, 
it may be concluded that children did not 
respond overtly to racial cues on the doll 
task with great frequency. 


DISCUSSION 


The pattern of results obtained indicates 
that responses based upon racial cues are 
already quite complex, even in three-year- 
old children. Although preschool children 
are not ordinarily considered to be sophisti- 


cated test takers, it was clear that the chil- 


dren were responding to many subtle nu- 
ances of the situation. This was particularly 
clear in the white kindergarten sample, where 
doll selections apparently were geared to the 
race of the tester. This ambivalence about 
expressing attitudes was not characteristic 
of the older black children who exhibited a 
slight preference for black dolls. 

The overall findings obtained with regard 
to children’s doll choices were clearly not in 
accordance with the clear-cut preferences 
for white dolls previously reported by most 
earlier investigators. It would indeed be 
tempting to interpret the present findings as 
Indicative of a general lowering of children’s 
Prejudicial attitudes. A few recent investi- 
gators, also finding that young black chil- 
dren do not show strong white preferences 

ave, in fact, attributed this to both general 
Societal change and the increasing pride 
associated with black identity (Datcher, 
Savage, & Checkosky, 1973; Hraba & 
rant, 1971). A complete look at research in 
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the area, however, suggests that a more 
complex explanation may be needed. The 
Clarks’ early findings with dolls have been 
replicated quite recently in their original 
community (Asher & Allen, 1969). More- 
over, preferences for white characters in 
picture choice tasks have been found by 
many investigators (e.g., Williams & Ro- 
berson, 1967). It is, of course, possible that 
black is more beautiful to children in certain 
parts of the country (e.g., large urban cen- 
ters) than in others, although no such pat- 
tern is easily discernible on the basis of 
available evidence. 

It is the view of the present investigators 
that the discrepant findings of the present 
study are more likely due to procedural dif- 
ferences than to either historical change or 
geographical variations. A number of ex- 
planations are possible in this regard. One 
possibility is that when given another basis 
for choice (ie., gender), skin color is not 
as salient for young children. Since only 
skin color was varied in the present study, 
a second possibility is that earlier results 
reflected a preference for cues other than 
skin color. Children, like the proverbial 
“gentlemen,” may prefer blondes. A third 
possibility is that in earlier studies subjects 
were not actually expressing their own 
racial preference, but were rather giving 
what they regarded as the socially desirable 
response anticipated by the examiner. Thus, 
responses to dolls of varying skin color may 
not necessarily be related to behavior out- 
side of the immediate testing situation. To 
date, no evidence exists to suggest they are. 
In summary, there appears to be ample 
reason to question both the validity of some 
of these earlier findings and whether the task 
itself is an appropriate indicator of a child’s 
attitude. 
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INDIRECT REVIEW AND PRIMING THROUGH QUESTIONS? 


ERNST Z. ROTHKOPF' ano MARJORIE J. BILLINGTON 
Bell Laboratories, Murray Hill, New Jersey 


Three experiments were performed to determine (a) whether attempts 
to answer a question facilitate the retention of other topically related 
material that is not directly relevant to the original question (indirect 
review); and (b) whether performance on a test item is better if a 
question topically related to the test item has recently been asked 
(priming). Evidence for indirect review was obtained following study of 
a 6,000-word text using adjunct questions. No priming effects were ob- 
served. The indirect review phenomenon suggests that searching one's 
memory to answer a question strengthens or makes more available a 
system of semantically related memory features broader than the 
memory requirements for the initial question. 


The purpose of these experiments was to 
explore two questions: (a) Do attempts to 
recall recently read material aid the reten- 
tion of other material that is topically re- 
lated to the initial recall attempts (indirect 
review); and (b) Is performance on a test 
item better if a question topically related to 
the test item has recently been asked (prim- 
ing)? 

These experimental problems were 
prompted by relatively broad issues in 
conceptualizing those properties of human 
memory that bear on instruction. Memories 
evoked by external events such as questions 
are thought of as broadly reconstructive of 
the experiences that have produced these 
memories, Evocations of internal represen- 
tations are assumed to propagate along di- 
Mensions of semantic similarity such as 
common topics and contexts. 

Consequently, it is plausible to argue that 
external events such as questions can result 
not only in (a) the activation of memory 
Structures specifically required to answer 
each question but also in (b) the activation 
of internal representations which are topi- 
cally related to adjunct questions and sub- 
stantially broader in scope. The activation 


i , We are indebted to Nancy Snidman of Fair- 
eigh Dickinson University in Madison, New Jersey 
or her help in conducting this experiment. 

Requests for reprints should be sent to Ernst 

4 Rothkopf, Bell Laboratories, 600 Mountain 

venue, Murray Hill, New Jersey 07974. 


of other topically related memories implies 
that the chain of hypothesized internal 
events resulting from a question can facili- 
tate other subsequent test performances. 
Facilitation occurs because (a) the internal 
representations, through the principle of 
exercise, have general instructive properties 
that produce relatively permanent changes 
in memory (review); or (b) the question 
temporarily activates topically related do- 
mains in memory and therefore makes suf- 
ficient responses to other topically related 
questions temporarily more accessible or 
more likely (priming). 

The direct instructive effect of adjunct 
questions which has been repeatedly ob- 
served (Rothkopf, 1966; Rothkopf & Bis- 
bicos, 1967; Rothkopf & Bloom, 1970) can 
be understood in terms of the relatively 
permanent strengthening, through review, of 
evoked memorial representations directly 
related to the question. Postreading per- 
formance on the questions that have been 
seen during the course of study was found 
to be substantially increased even though 
there was no opportunity for rereading re- 
lated portions of the text. The direct instruc- 
tive effect has been found to be greater 
when informative feedback is provided after 
answering the adjunct question, but it is of 
substantial magnitude even when no direct 
feedback is available to the subject (Roth- 
kopf, 1971). It has been studied mainly in 
situations where the content of the memory- 
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evoking question has been identical to the 
test question on which the effect of exercise 
was measured (e.g., Rothkopf & Bisbicos, 
1967; Rothkopf & Bloom, 1970), where it 
has been related to the criterion question 
by a simple syntactic or semantic transfor- 
mation (Rothkopf & Coke, 1966), or in 
which the required criterion test informa- 
tion was a mediating term in the logical 
structure relating the adjunct question to 
the source text (Frase, 1969). 

Evidence for priming has been provided 
by a recent study by Meyer (1973). He has 
reported that the time required to judge 
that a string of letters was an English word 
was shorter when the presentation of the 
word was preceded by the judgment of 
another semantically related rather than an 
unrelated word. Similar effects involving 
the likelihood of certain responses have been 
reported in word association studies (e.g., 
Martin, 1964). P 

Some recent observations suggest that it 
may be possible to measure review or prim- 
ing effects which were attributable to topical 
propagation of memory processes evoked 
by questions beyond the representations 
needed to answer the question. McGaw and 
Grotelueschen (1972) reported that post- 
reading test performance was elevated on 
questions that have substantial elements in 
common with adjunct questions but could 
not be directly answered from the material 
in the adjunct question. For example, post- 
reading performance on 


Question 1, 


The surveying ship, which recovered starfish 
from a depth of 1,260 fathoms in 1860, was ex- 
ploring a route for a cable from Faroe to ? . 


was increased by seeing the following adjunct 
question during the course of study: 


Question 2, 


The surveying ship ? , which recovered starfish 
from the depth of 1,260 fathoms in 1860, was ex- 
ploring a route for a cable from Faroe. 


Questions 1 and 2 were derived from a text 
segment that read as follows: 


Then from the surveying ship Bulldog, examin- 
ing a proposed northern route for a cable from 
Faroe to Labrador in 1860, came another report. 
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The Bulldog’s sounding line, which at one place 
had been allowed to lie for some time on the 
bottom at a depth of 1,260 fathoms, came up 

with 13 starfish clinging to it. | 


McGaw and Grotelueschen’s adjunct | 
questions may have provided the occasion | 
for the review or evocation of memory 
structures which were topically related to 
the adjunct questions but which were suffi- 
ciently broad in scope to facilitate per- 
formance on other questions about that 
topic. It is as if the subject, in searching his 
memory for an answer to Question 2, also 
strengthens or makes more available certain 
memory features that were critical for Ques- 
tion 1 but not for Question 2. | 

The conditions induced by questions, 
which dispose toward indirect review closely 
resemble those required for priming effects. 
Both the indirect review and the priming 
hypotheses are invoked to account for the 
improved performance on à question that 188 
consequence of an earlier experience with à 
question on a closely related topic. The eriti- 
cal difference between the two hypothesized 
processes is the dependence of the priming 
mechanism on relatively short temporal m- 
tervals between the two related questions 
Priming effects such as those reported i ; 
word association studies (e.g., Martin, l 
disappear quickly in time. No time effectis 
assumed for the indirect review effect since 
the process presumably affects relatively 
permanent representations in memory. à 

The first and second of the experimen 
were aimed at exploring review effects T 
duced by adjunct questions. The second aní 
the third experiment involved attempts 
investigate priming effects produce j 
questions. 


EXPERIMENT 1 
Method 


General Procedure 


$ ag? 
Subjects read a 24-page typewritten ben 
with two adjunct questions embedded E 
4 pages. The adjunct questions were der! reced 
the material in the 4 pages that had just th 
the questions. Immediately after Te 
subjects were tested. This criterion teni weston 
questions closely matched to the adjunct d tod t0 
as well as some test items that were un 
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TABLE 1 
DESIGN oF EXPERIMENT 1 
AQRT 

Group AQ grceay (AO | Test order 
Ai sequential | A; A,B: A,B, | sequential 
A; random A, | A;B: | AiB, | random 
Bı sequential | Bı | A:B2 | AiB, | sequential 
Bı random B, | AsBs | A,B, | random 
Assequential | As | AiBi | AB: | sequential 
À; random A: | Aj)Bi | A:Bı | random 
B: sequential | Ba | AiB: | AsB; | sequential 
B; random Bi | AiBi | AsB; | random 


Note. The n for each group is 15. Abbreviations: 
AQ = adjunct question, CT = criterion test, 
AQRT = adjunct question retest, M = items 
matched to AQs, Mı = unmatched items, and 
m = items matched to 12 items in the criterion 
est. 


the adjunct questions. A second test covering the 
adjunct questions was administered subsequently 
(adjunct question retest). 


Materials 


An approximately 6,000-word passage from 
Rachel Carson's The Sea Around Us was used.? It 
was typed on 24 full pages. Two questions were 
constructed for each page for a total of 48 ques- 
tions. The 2 questions from each page were 
matched in that they asked for information about 
closely related elements from that page, fre- 
quently from the same sentence, and generally 
had several phrases in common. However, each 
pair was constructed so that one question of the 
pair could not be answered by inspecting the other. 
An exdmple of a pair of matched questions was 
given in the introduction. 

The matched pairs of questions from each con- 
secutive set of four pages were arbitrarily divided 
Into two groups of pairs, A and B, for a total of 
12 pairs in each group. The only restriction on the 
division of the matched pairs into two groups 
was that the first, second, third, and fourth pages 
of the four-page interquestion text segments 
Were equally represented in A and B. 

, Four sets of 12 questions each were assembled 
ìn the following manner: Ai, composed of one 
member of each matched pair of Set A; As, com- 
Posed of the other member of each pair in Set A; 
1 composed of one member of each matched 
pair in Set B; and Bz, composed of the other ques- 
lon in each matched pair in Set B. 
— 


* Permission for the experimental use of these 
copyrighted materials was kindly granted by the 
Rublishers, Oxford University Press, 200 Madison 

venue, New York, New York 10016. 
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Design 


The experimental design is shown in Table 1. 
One fourth of the subjects were given A, As, By, 
or Bz, respectively, as experimental adjunct ques- 
tions. For the subject assigned to A: or Bi, the 
criterion test consisted of the 12 A; and the 12 
B: questions. It should be noted that half of the 
items of the criterion test were matched to adjunct 
questions while the remainder were not. Matched 
and unmatched items were mixed together in & 
nonsystematie manner. They will be referred to 
as M and M,, respectively. Subsequent to the cri- 
terion test, subjects were given another test, 
adjunct question retest, that consisted of the Ai 
and B, questions, that is, half of the adjunct 
question retest was composed of the previously 
seen adjunct questions, while the remaining items 
consisted of questions never seen before but, which 
happened to be matched to 12 items in the criterion 
test. The latter will be referred to as My. For 
subjects assigned A» or B: as adjunct questions, 
the criterion test consisted of the questions from 
Sets A; and Bı. These subjects were given the 
questions from Sets A; and B; as the adjunct 
question retest. 

For exploratory purposes, the test questions 
of the criterion test and the adjunct question 
retest, respectively, were presented to half of the 
subjects in each adjunct question group in the same 
ordinal sequence in which the materials under- 
lying these items occurred in the text. For the 
remaining subjects, the sequential order of items 
of the criterion test and the adjunct question re- 
test were randomly arranged. Fifteen subjects 
were assigned to each of the eight adjunct question 
by test-sequence combinations for a total of 120 
subjects. 


Procedure 


The subjects received a package of four en- 
velopes (see Rothkopf & Coke, 1968, for details 
of this procedure). The outside of each envelope 
was marked with a large numeral which uniquely 
identified it. Envelope Number 1 contained the 
passage with the embedded adjunct questions; 
Number 2 contained a short answer test (criterion 
test); Number 3 contained another short-answer 
test (adjunct question retest); and Number 4 
contained a background questionnaire to deter- 
mine the subject's familiarity with the materials. 
Inside each envelope was a distinctively colored 
sheet giving detailed directions on how to use the 
materials contained in the envelope. The direc- 
tions were to study the text carefully and not to 
reinspect any page that they had already read. 
Responses to adjunct questions were written into 
the designated answer space on the question blank 
and no feedback as to the appropriateness of the 
answer was provided. A monitor took care to as- 
sure that all subjects worked through the enve- 
lopes in the proper order. 

Study and test time were controlled by the sub- 
jects. The subjects were run in small groups of 
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TABLE 2 
Average NuwBER Correct RESPONSES FOR 
Various EXPERIMENTAL CONDITIONS WITH 
SEQUENTIAL AND RANDOM TEST ORDERS ON 
THE CRITERION TEST AND THE ADJUNCT 
Question RETEST 


Criterion test Adjinet quae 
Test order 
NI 
Matched | matched | (eiiact ea 
) M questions) | | (Ms) 
Sequential| 3.88 3.43 5.52 3.40 
Random 3.27 2.92 5. 3.08 
M 3.58 3.17 5.70 3.24 


4-10 persons. Each subject, was seated in a small 
cubicle. 


Subjects 


Paid volunteer college students (n = 120) 
served as subjects. None had 15 or more credit 
hours of college-level biology (including the cur- 
rent semester), and none stated that they had 
read The Sea Around Us in the recent past. 


Results 


Criterion Test 


The correct responses on the criterion test 
were subjected toa 4 X 2 X 2 analysis of 
variance. The factors were the four sets of 
adjunct question conditions, test sequence 
(sequential vs. random), and matched versus 
unmatched items, with repeated measures 
on the last factor. The data pertinent to this 
analysis as well as the other main data from 
the criterion test and adjunct question re- 
test are summarized in Table 2. The average 
number of correct responses on matched (M) 
items (X = 3.58, « = 2.01) over all con- 
ditions was significantly greater (F = 7.26, 
df = 1/112, p < .01) than that for un- 
matched test questions, M: (X = 3.17, 
c = 1.77). This result was consistent with 
the findings reported by McGaw and 
Grotelueschen (1972). 

Performance on test questions sequenced 
in the same order as the text (X = 3.66, 
c = 1.93) was higher than on random ar- 
rangements of test items (X = 3.09, ¢ = 
1.83), but the difference between the two 
conditions was not significant (F — 3.41, 
df = 1/112, .10 > p > .05). The some- 


ERNST ROTHKOPF AND MARJORIE BILLINGTON 


what more accurate recall for the nonrandom 
test order suggests the possibility of some 
priming effects since questions from con: | 
tiguous text segments tend to be more 
closely related topically than randomly or- 
dered questions. There were no significant 
differences among the four adjunct question 
sets nor were there any significant interac: 
tions. 


Adjunct Question Retest 


The adjunct question retest consisted of | 
24 test questions. Of these, 12 had been seen. 
before during the course of study as adjunct 
questions. A 4 X 2 analysis of variance Was) 
performed on correct responses on the ad: | 
junct question retest. The factors were the 
four adjunct question sets and whether the | 
items had been seen or not as adjunct ques 
tions, with repeated measures on subjects 
on the last factor. Previously seen adjunct 
question retest items (adjunct questions) pro- 
duced more correct responses (X = 5:0 
g = 2.65) than those not previously seen; 
Ma (X = 324 c = 189; F = 9945 
df = 1/116, p < .01). The adjunct ques 
tion set factor and the interaction were 1 
significant. ! 


Interval Effects 


There was some suggestion in the present 
data that the time interval between inspel 
tion of the relevant text segment and ^i 
junet question has some bearing on 1 
indirect review effect. The data relevant, 
this conjecture are given in Table 3, W 
was derived from the following analys | 
test questions used in the criterion test i 
divided into four groups of 12 each dep i 
ing on whether they were from the fost 
second, third, or fourth pages of the E 
page text segments that precede d 
questions in the text. The average nU ated 
of correct responses per item was cal 5 
for each page location. This was do 
rately for the M and the Mi ME amns! 
sults, tabulated respectively ID cor indi 
and 3 of Table 3, indicate that t related 
rect review effects produced by E^ ges 
(matched) adjunct question hs A: to the 

j i were T 
when adjunct questions precedes 1 
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text page that immediately 
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TABLE 3 
AvERAGE CORRECT RESPONSES ON THE CRITERION 
Test rog [TEMS FROM Four LOCATIONS IN 
EacH Four-Pace Text SEGMENT 


Matched | Not matched 


Page ans dip Difference 
1 9.25 10.00 EI^ 
2 7.07 6.33 1.34 
3 8.00 6.67 1.33 
4 10.83 8.75 2.08 


^ The figures shown in these columns are aver- 
aged over test items rather than over individual 
subjects. The column means, which would be based 
on 48 items, may be converted by multiplying with 
48/120 to the matched and unmatched item data 
described in the first paragraph of the Results 
section. The latter have been averaged over 120 
subjects. 


A 2 X 2 analysis of variance, comparing 
data from the most, distal (p. 1) and the 
most proximal (p. 4) pages, under the 
matched and unmatched conditions, sup- 
port this conclusion. Item means rather 
than subjects were used for the purposes 
of this analysis. For the matching treat- 
ment, the obtained F was not significant 
(F = .97, df = 1/22), but the interaction 
between matching and page location was 
(F = 4.39, df = 1/22, p < .05). A t test 
between matched and unmatched items from 
Page 4 locations was significant (t = 2.18, 
df = 22, p < .05). The matched and un- 
matched condition did not differ signifi- 
cantly for Page 1 locations. 


Discussion 


Experiment 1 confirmed McGaw and 
Grotelueschen’s (1972) observation that 
adjunct questions in the postreading position 
can produce an indirect review effect. The 
replication was obtained with materials that 
were substantially identical to those used by 
McGaw and Grotelueschen, but it involved 
à larger number of test items and experi- 
mental procedures which differed in several 
Ways from those used in their study. The 
BU effect became manifest through 
acilitated performance on test items that 
ee topically related to the adjunct ques- 
ere The effect was small. The performance 
acilitation was 12.8% of the comparison 
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score and an increase of only 4% in the ab- 
solute proportion of correct responses. 


EXPERIMENT 2 


The experimental passage and questions 
used in Experiment 1 were printed in mimeo- 
graphed booklets. Even though the subjects 
were closely monitored, the possibility could 
not be absolutely ruled out that the indirect 
review effect may have been produced by 
subjects surreptitiously leafing back to 
relevant portions of the text while trying to 
answer the experimental questions or imme- 
diately thereafter. In doing so, they may 
also have inspected materials relevant to 
the test item that was matched to the ad- 
junct question. The finding that the review 
effect was larger when the relevant materials 
are within one or two pages of the experi- 
mental question (see Table 3) would be con- 
sistent with this conjecture. Alternatively, 
this result may be an indication that the 
review effect operates only on representa- 
tions in primary memory (see Waugh & 
Norman, 1965) and that these decay or 
become less accessible in time. 

Experiment 2 was therefore performed in 
order to test for the indirect review effect 
when surreptitious reinspection of the text 
following the adjunct question was pre- 
cluded. For this purpose, all experimental 
material was presented on photographic 
slides under conditions that prevented the 
reexamination of the text after the initial 
inspection. A second purpose of Experiment 
2 was to explore priming effects associated 
with the successive presentation of two re- 
lated questions. 


Method 


Materials and General Procedure 


A negative 35-millimeter slide was prepared of 
each of the 24 typed pages of the 6,000-word ex- 
perimental passage used in Experiment 1. The 
slides were projected one at a time on a screen. 
Inspection time was controlled by the subject, 
but the subject could not return to any previously 
inspected slide. One adjunct question was included 
in the slide sequence after every second text page. 
Immediately after reading, a 24-item test was 
administered consisting of a sample of (a) ques- 
tions matched to adjunct questions; (b) questions 
not matched to any adjunct questions; (c) pairs 
of matched questions, one of which had been used 
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as an adjunct question; and (d) pairs of matched 
questions neither of which had been used as an 
adjunct question. 


Adjunct Questions and Test 


The two sets of 12 matched questions (A1: As; 
Bi: B3) employed in Experiment 1 were used. One 
fourth of the subjects used Ai, As, Bı, and Bs, re- 
spectively, as adjunct questions (AQ). One ad- 
junct question was administered every two text 
slides (pages). It was derived from the two pages 
which preceded its administration. 

Immediately after reading, a 24-item short- 
answer test was administered. This consisted of 
(a) four questions (M) matched to adjunct ques- 
tions used during reading but without the matched 
adjunct questions occurring in the test; (b) four 
questions (M) for which no matching adjunct 
questions had been administered during reading 
and for which no matching questions were pre- 
sented in the test; (c) four pairs of matched ques- 
tions, one member of each pair had been used as 
an adjunct question (AQ) which was followed 
immediately in the test sequence by the question 
that was matched to it (Mp); and (d) four pairs of 
matched questions neither of which had been used 
as adjunct questions; the first member of each 
matched pair in the test sequence was designated 
as Qp and was immediately followed by the ques- 
tion that was matched to it (Mp). The questions 
were presented in the same order on the test as 
the sequence of the source material in the text. 
As a consequence, the matched pair of questions, 
AQr:Mp or Qp:Mp were always presented in im- 
mediate sequence. The review effect may be in- 
ferred if questions from Category M resulted in 
more correct responses than those from Category 
M. Priming may be inferred if Mp > M and if 
Mp > M with respect to correct responses. It 
may be concluded that adjunct questions had 
direct instructive effects if AQ» resulted in more 
correct responses than Qp. 

The equal use of the four question groups Ai, As, 
Bı, Bs, as adjunct questions and the use of three 
different test arrangements for each adjunct set 
made it possible to have all 48 available questions 
appear equally often in the various classifications 
of test items. 


Apparatus 


Each subject was seated in a booth that was 
fitted for rear projection of the text material. The 
projector advance was operated by the subject 
with a hand switch. The circuits connecting the 
subject’s switch to the projection equipment were 
wired through control panels in the experimenter’s 
booth. This allowed the experimenter to monitor 
each subject’s progress through the text. For fur- 
ther details of the apparatus see Rothkopf and 
Bloom (1970). 
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Procedure 


The subject read the passage which was pro- 
jected, one page at a time, on the screen in front | 
of him. After every two text slides there was a 
question slide distinguished by a green back- | 
ground field. The subject wrote his answer to the 
question on the answer sheet provided. The ex- | 
perimenter gave the subject the written, 24-item 
criterion test after the subject had read the entire 


Study and test time were controlled by the sub- 
ject. The subjects were run in groups of from two 
to three persons. 

Subjects 
Paid volunteer students (n = 180) served as 


subjects. Of these, 144 were college students and 
36 were high school seniors. None had read The 
Sea Around Us in the recent past, and none had 
15 or more credit hours of college biology. 


Results 


Review Effect 


Over all 180 subjects, each of the 48 pos- 
sible test questions was assigned 15 times to 
each of the several experimental roles. The 
analytical comparisons of this experiment 
were based on correct responses per 
question averaged over all 48 items. The 
review hypothesis was tested by comparing | 


performance between M and M items. The 
the review 


a, 


results were consistent with | 
hypothesis (X4 = 6.08 > Am = 
maximum score = 15). The effect 15 sm 
but statistically reliable. Reliability "a 
tested by comparing results for each of the 
48 possible test items when it was assign E 
to the M and M role in the test. Performa 
on 28 cases was greater for M than for s 
corresponding item in the M state; n 
greater than M in 14 instances; and the 
conditions were exactly equal in 6 di E 
sign test indicates that results such as 
would occur by chance with p < 04 those 
2.01). These results therefore conn m 
of Experiment 1. Performance on 8 tes eal’ 


was facilitated if a question that is top’ 


dy of the 


related had been asked during stv 


text. 
Priming : 
iming 
Two comparisons relevant to the pu 
y 


hypothesis were possible in this expe 
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These were (a) when the priming item in 
testing had also been used as an adjunct 
question during reading and required the 
comparison of Mp with M and (b) when the 
priming item was not used as an adjunct 
question, that is, the comparison between 
Mp and M. The latter comparison could also 
be made by contrasting the performance 
between Qp and Mp. The Qp items always 
preceded Mp in the test sequence and for 
that reason Qp may be thought as unprimed 
while M, is primed. However, since the test 
was presented in booklet form, the order in 
which the items were responded to was not 
entirely certain and the nature of the Q»: M 
contrast was therefore less clear than 
M:Mp. 

As indicated previously, the average num- 
ber of correct responses for M items was 
6.08 while the mean number of correct re- 
sponses for Mp was 6.10. It may therefore 
be concluded that there is little evidence 
for priming in items for which an adjunct 
question was presented during inspection of 
the text. 

On the other hand, the data indicate the 
possibility of priming effects in test items 
without matching adjunct questions during 
the study of the text. The average number 
of correct responses for M items was 5.44 
and for Mp it was 6.08. Procedures, similar 
to the method used in the review compari- 
son, involving individual test items were 
again used. The results were as follows. For 
27 test items, Mp > M and in 12 cases 
M > Mp. Performance was exactly equal 
on M and Mp items in 9 cases. A sign test 
indicates p < .05 (e = 2.24). This result 
must be interpreted with some caution since 
the M data have been used in a previous 
comparison. Some further corroboration for 
the conclusion that priming enhances per- 
formance in items not matched by any 
adjunct question may be found by noting 
that the average number of correct responses 
to Qe items (X = 5.69) was also somewhat 
lower than for Mp (X = 6.08). However, 
in only 21 cases M, > Qr, in 17 cases 
Qr > Mp, and for 10 items Mp = Qe and 
the difference was not significant. 

Finally, it is worth noting that, as in Ex- 
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periment 1, substantial direct instructional 
effects of questions were observed. The 
powerful direct instructional effect of ques- 
tions was indicated by the high performance 
on AQp items (X = 8.13). When con- 
trasted with the Qr question (X = 5.69), 
AQ» > Qr for 33 items, Qe > AQ» in 11 
cases, and AQp = Qrfor4items(e = 3.17, 
p < .002). 
Discussion 

The results of Experiment 2 indicate that 
a response to a question improves perform- 
ance on other topically related items. The 
finding that performance on M was higher 
than that on M replicates the review effect 
observed in Experiment 1 under conditions 
that rule out the possibility of surreptitious 
reexamination of the text. However, in view 
of the small magnitude of this effect, the 
phenomenon is probably of more theoretical 
than practical importance. 

There were no observable test sequence 
effects of performance when an item was 
preceded by a matching question that had 
been used as an adjunct question (i.e, Mp = 
M). There was some indication however 
that test sequence facilitation may have 
taken place for those items that were pre- 
ceded in the testing sequence by matched 
questions that were nof used as adjunct 
questions (e.g, Mp > M or Mp > Qr). A 
priming account can be accommodated to 
these results. It would be more parsimonious, 
however, to account for these data in terms 
of indirect review. Such indirect review is 
assumed to take place during attempts to 
answer Qp and it improves the subject’s 
ability to respond to Mp, the item matched 
to Qp which follows immediately after it in 
the testing sequence. 


EXPERIMENT 3 


Experiments 1 and 2 have provided rea- 
sonably firm evidence that an attempt to 
answer a question can improve the subse- 
quent ability to answer other questions from 
the same topico-spatial neighborhood of the 
text. Whether this is interpreted as an in- 
direct review or priming phenomenon de- 
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pends on the effect of the time interval be- 
tween the two related questions. If the 
facilitating effect lasts for a substantial 
period of time, that is, if the first, questions 
have produced a relatively permanent change 
in memory, then an indirect review interpre- 
tation can be invoked. Experiments 1 and 2 
provided substantial indications of indirect 
review phenomena. The priming hypothesis 
depends on the demonstration of à weaken- 
ing of the facilitation effect as the temporal 
interval between the two related (matched) 
questions is increased. The data of Experi- 


ment 2 were equivocal with respect to the - 


priming hypothesis, but priming continues 
to be an interesting conjecture about facili- 
tation between two related questions because 
priming effects have been obtained in mem- 
ory search (Meyer, 1973) and in word asso- 
ciation experiments (Martin, 1964). 

In Experiment 2, test items preceded in 
the test sequence by a matched question 
that had not been an adjunct question re- 
sulted in somewhat higher performance than 
the questions that preceded them and also 
resulted in higher performance than un- 
matched, unprimed items. This suggests that 
priming may have played some role in sup- 
porting subjects’ test performance, but it 
also leaves open the possibility that the 
higher performance on the second question 
may have resulted from indirect review that 
took place during testing. 

Experiment 3 was intended to explore the 
priming factor by systematically manipu- 
lating the interval between priming events 
and subsequent related items in testing. 


Method 
Materials and Procedure 


Materials and procedure were exactly as in 
Experiment 2. Subjects read a 24-page passage 
which was projected one page at a time onto a 
screen in front of them. An experimental question 
(adjunct question) was embedded after every sec- 
ond page. The adjunct question was derived from 
information in the two pages immediately preced- 
ing it. Immediately after reading the passage, the 
subjects were tested. This criterion test (Crite- 
rion Test 1) included some adjunct questions as 
well as questions that were closely matched to 
the adjunct questions. A second criterion test 
(Criterion Test 2) similar to the first was ad- 
ministered either 5 or 30 minutes after the sub- 
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ject completed the first one. It included items 
matched to questions in Criterion Test 1. During 
this delay period, the subjects judged comprehen- 
sion difficulty of a number of short, unrelated 
paragraphs. 


Tests 


Criterion Test 1 included 24 items. It consisted 
of (a) four pairs of matched questions; one mem- 
ber of each pair had been used as an adjunct ques- 
tion, (it will be referred to as AQip; adjunct ques- 
tions, immediate priming) which was followed 
immediately in the test sequence by the question 
that was matched to it (Mir); (b) four pairs of | 
matched questions neither of which had been used 
as adjunct questions; the first member of each 
matched pair in the test sequence was designated 
Qır and was immedately followed by the question 
that was matched to it (Mir); (c) four questions 
that had been used as adjunct questions (referred 
to as AQpp; adjunct questions, delayed priming), 
the questions matched to these were presented in 
Criterion Test 2; (d) four questions (Qor) which 
were not used as adjunct questions, with their 
matching questions presented in Criterion Test 2. 

Criterion Test 2 consisted of 24 items not used 
in Criterion Test 1. These comprised (a) four 
pairs of matched questions, one member of eat 
pair had been used as an adjunct question (it will 
be referred to as AQrr.s; adjunct questions, imme- 
diate priming, Test 2) which was followed immedi- 
ately in the sequence by the question that M 
matched to it (Mrr-2); (b) four pairs of match 
questions neither of which had been used SP 
an adjunct question; the first member of eat 
matched pair in the test sequence was designa 
Qir-: and was immediately followed by the qu 
tion that was matched to it (Mrr-2); (c) four que 
tions (Mpp-2) each matched to one of the AQpr 
Criterion Test 1; and (d) four questions DPA! 
each matched to a Qpp item in Criterion Test l 

Performance on Myp and Mir-: allows A 
ment of immediate priming by questions p 
during reading (adjunct question); performs | 
on Mrp and Mip.: should reflect immediate Prnt 
by related questions (Q) not seen during Bor i 
Responses to Mpp-2 and Mpr-: make it poss! ming 
examine any effects due to delay in ri j 
These two classes of items were removed fro 
matched adjunct questions or 
interval that separates Criterion Test 1 an 
terion Test 2, that is, either 5 minutes or 
utes. ; a 

One fourth of all subjects were assigned 
tion groups Ai, Az, Bi, Ba, respective MANT 
junct questions. Four test arrangements "This made 
for each adjunct question assignment. “th bil 
it possible to equalize the frequency WI each 
each criterion test question was used in 
the four categories. 


Subjects 2) 


Paid volunteer high school studente (* Aoi 
served as subjects. None had read The 
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TABLE 4 


Mean Correct Responses ON Test Items OF VARIOUS TYPES ron THE Two TREATMENTS 
IN EXPERIMENT 3 


Test 1 Test 2 
Test item Treatment Test item Treatment 
S minutes | 30 minutes x Sminutes | 30 minutes ; x up 

Agir 1.92 1.94 1.93 AQr-s 1.61 2.08 1.85 
Mir 1.42 1.67 1.54 Mir- 1.39 1.56 1.47 
Qur 1.50 1.39 1.44 Qus 1.39 1.61 1.80 

1P 1.28 1.56 1.42 Mira 1.11 1.78 1.44 
AQor 1.75 2.20 1.97 Mp. 1.33 1.53 1.43 
Qor 1.14 1.56 1.35 p: 1.00 1.50 1.25 


Note. Abbreviations: AQ, AQres = adjunct questions which served as immediate primers; Mir, 


Mir ae matched items which were immediately 
questions which served as immediate primers; 


receded by AQ primers; Qir, Qr previously unseen 
ir, Mrs, items which were immediately preceded by 


M primers; AQpr, Qp» adjunct questions and previously unseen questions, respectively, which served 
as delayed primers; Mpr.: and Mpp.: items matched to AQpr and Qpr primers. 


Us in the recent past. Half of these were assigned 
in a nonsystematie manner to the 5-minute and 
the 30-minute delay condition. 


Results 


The results from the two criterion tests 
are summarized in Table 4. They must be 
interpreted cautiously since each of the treat- 
ment cells represents only 144 observations. 
. The major comparisons of interest were 
immediately primed (Myp-2, Mae) and de- 
layed primed items (Mpp-2, Mop-2) in the 
second criterion test. In the four relevant 
comparisons that were possible over the two 
treatments, the primed items produced 
slightly higher performance than the de- 
layed primed items in three out of four cases. 
For purposes of statistical comparison, 
Scores on Myp.2 were combined with Mir- as 
were scores on Mpp.2 With Mpp.s A 2 X 2 
analysis of variance (Treatment X Prim- 
Ing Delay) with repeated measures on the 
Second factor was performed on the com- 
bined scores. The analysis of variance indi- 
cates that the 30-minute group produced 
more correct responses than the 5-minute 
group (F = 5.60, df = 1/70, p < 905). 
This was probably due to chance sampling 
differences since, as may be noted from 
Table 4, the 30-minute treatment outper- 
formed the 5-minute group for five out of six 
test item types on the immediate tests (Cri- 
terion Test 1) when the two treatments were 


operating under the same conditions. Per- 
formance in Criterion Test 2 was higher for 
primed items in both treatments but the 
priming factor was not significant (F — 
1.14, df = 1/70) nor was the interaction 
(F = .10, df = 1/70). 

Inspection of these data also indicated 
that little or no retention loss took place 
during either the 5-minute or 30-minute 
delay interval. 

‘Another test of the priming hypothesis is 
possible using the results from Criterion 
Test 1. According to the priming hypothesis, 
correct responses to My items should be 
more numerous than to Qp» questions. For 
the 5-minute group, this is the case but the 
difference is not statistically reliable. For the 
30-minute group, the average number of 
correct responses to Mie and Qpr questions 
is equal. 

Performance on Myp would be superior to 
Qor if either the review or the priming factor 
or both were operative. The means for Mie 
under both the 5-minute and _ 30-minute 
treatment were higher than for Qpr. Com- 
bining the two treatments and using the 
item-by-item comparison and sign test used 
in Experiment 2, My» was greater than Qpr 
in 19 cases and smaller in 10 (z =+1.49, 

< .07, one-tailed). It must therefore be 
concluded that neither review nor priming 
factor could clearly be demonstrated in this 


case. 
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GENERAL DISCUSSION 


The results of the experiments reported 
here support the hypothesis that being ques- 
tioned about a narrow topie while reading 
enhances the recall of other material closely 
related to that topic. The data are consistent 
with the conception that the subject searches 
his memory while trying to answer an ad- 
junct question during reading. In doing so, 
the subject reviews and strengthens previ- 
ously established related memory represen- 
tations or makes them in other ways more 
accessible in subsequent tests. 

The relationship between the perform- 
ances facilitated by this indirect review proc- 
ess and the nature of the facilitating adjunct 
question requires further exploration. In the 
present studies, the matched questions were 
always drawn from the same topico-spatial* 
neighborhood of the text, but it is at least 
in principle possible that either topical or 
spatial factors may be sufficient to propa- 
gate the review effect. The absolute magni- 
tude of the indirect review effect, as ob- 
served here and also as reported by McGaw 
and Grotelueschen (1972), was extremely 
small. This suggests that the matching oper- 
ation which was done in a crude unsyste- 
matic manner needs to be characterized in 
more explicit terms. This is required in order 
to understand more deeply the dimensions 
along which the indirect review effect oper- 
ates. 

There are several weak suggestions that 
some priming occurs in testing in the present 
experiments. However, the experimental 
facts do not offer sufficient support for this 
conjecture and it must be rejected. The re- 
sults that are particularly fatal for the 
priming hypothesis were that the “priming” 
effects were not substantially attenuated by 
interposing 30 minutes between matched 
test items. It therefore seems more reason- 
able to conclude that indirect review pro- 
duced the elevation in Mp items in Experi- 
ment 2 and the nearly identical performance 


zl Although the spatial factor was not system- 
atically manipulated, it cannot be ruled out as a 
contributor to the indirect review effect. This is 
because in the present experiment, topical relation- 
ships among text elements were always accom- 
panied by spatial contiguity. 
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level on M and M both under immediate | 
and delayed priming conditions in Experi- 
ment 3. The empirical principle may be 
stated as follows: A question about any 
given topico-spatial neighborhood of a text 
tends to facilitate performance on test items | 
drawn from other text elements topically | 
related to that neighborhood. | 

An apparent exception to this rule is the 4 
absence of elevated performance in M; items 
on the adjunct question retest in Experiment: 
1. These items were preceded in the earlier | 
test (Criterion Test) by matched questions | 
(Mi) and should therefore have benefited by 
indirect review in attempts to answer My) 
However, any conclusion that facilitation | 
took place involves here a comparison of M: 
with Ma. This comparison is confounded by | 
the longer retention interval for Ms items | 
and by possible destructive effects of testing. 
Consequently, failure to find facilitated per | 
formance on M; is not critical for the indirect 
review hypothesis. 

The priming hypothesis should not be] 
ruled out on the basis of the present e 
periments alone. The common topic 
threads which ran through most of the 6,000- 
word experimental passage prevent a sensi 
tive test of the priming hypothesis. A more 
powerful test would involve a passage W! 
many distinctively different topical themes: 
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SEX DIFFERENCES IN COMPREHENSION OF HIGH- AND | 


Previous research has found that elementary-school-age boys read 
more poorly than girls. The present study investigated whether sex 
differences in reading comprehension are affected by variations in the 
interest level of the material. Fifth-grade children's interests were | 
assessed using a picture-rating technique. Each child then read | 
material that corresponded to his or her high- and low-interest areas. i 
The cloze procedure was used to measure comprehension. Results | 
indicated that boys read as well as girls on high-interest material but 
that they were significantly poorer readers of low-interest material. | 
There is considerable evidence that ele- thematically appropriate for neither boys 
mentary school boys read more poorly than — nor girls (Blom, Waite, & Zimet, 1968) lf 
girls. Boys lag as much as one half year boys are particularly motivated by the in- 
behind girls on reading achievement tests terest level of material, then their perform- 
(Asher & Gottman, 1973; Stroud & Lind- ance could be strongly affected by vana- 
quest, 1942), and between 60% and 90% tions in reading material. E. 
of elementary school children referred for There is little research on the relationship | 
remedial reading instruction are male between the interest value of reading mate- 
(Blom, 1971). Although the poorer per- rial and reading comprehension. Data from 


formance of boys may reflect lack of skill, 
it is possible that boys are simply less 
motivated to engage in reading activity. 
There is evidence that reading has less 
appeal for boys. For example, boys view 
reading as a sex-inappropriate activity 
(Kagan, 1964; Stein & Smithells, 1969) and 
have a more negative attitude toward read- 
ing (Neale, Gill, & Tismer, 1970). 

The purpose of the present study was to 
evaluate the extent to which variations in 
motivation affect sex differences in reading 
comprehension. A powerful motivational 
variable may be the interest level of the 
material. One study of children's reading 
books found that mueh of the material was 
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In one study (Shnayer, 1967), children's 
interest in the material was assessed al 

they had read the material and had been 
tested for comprehension. Since children 
may be more interested in material they i 
comprehend as well as in material with bs 
appeal, the postreading interest mens 
of questionable validity. In other stu 
(Bernstein, 1955; Klein, 1969; Stanchfiel i 
1967), children were given reading ne 
based on previous studies of children's 

terests. Since individual children’s intere 


: in- 
differ from group norms, this procedure v 
venia Finally, Pr 
g achieve 
or eae 


troduces experimental error. 4 
vious studies have used readin 
ment tests specifically developed f 
study with no prior demonstration 0, 
reliability or validity. In many cases, 
selection appeared to be arbitrary- ont 
To overcome these limitations, the p 

study assessed individual children's E. n 
independently of any reading mate 


available studies are inconsistent, perhaps 
as a result of problems in research design 
E | 
e 
addition, a reliable and valid measu 
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reading comprehension was employed. The 
experiment consisted of three phases. In the 
first phase, a picture-rating technique was 
used to assess children’s interests. Students 
were shown a series of 25 photographic 
slides representing a wide range of topics. 
They rated the interest value of each slide. 
Based on the ratings, three high- and three 
low-interest areas were identified for each 
student. 

One week later, in the second phase of the 
experiment, reading comprehension was 
assessed, Each child read three passages 
with topics that corresponded to high-in- 
terest areas and three passages that corre- 
sponded to low-interest areas. Passages were 
presented in cloze format (Taylor, 1953) ; 
every fifth word was deleted from each para- 
graph and the child’s task was to supply each 
of the missing words. Previous research with 
the cloze procedure has found that it is reli- 
able and that it correlates highly with stan- 
dardized reading comprehension achieve- 
ment tests (Bormuth, 1967, 1968; Rankin & 
Culhane, 1969). Furthermore, the clearly 
specifiable rules for ereating a cloze passage 
eliminate the subjectivity and arbitrariness 
inherent in many other approaches to item 
construction. 

In the third phase of the experiment, stu- 
dents were asked how much they would like 
to read more about each paragraph topic. 
This served to assess the validity of the 
pieture technique for selecting material. If 
the technique is valid, then children should 
prefer the paragraphs on topics correspond- 
ing to high-interest pictures. 

In addition to paragraph comprehension 
and paragraph preference data, standardized 
reading achievement test data were 
gathered from student records. From these 
data, it was determined whether the pres- 
ent sample of subjects shows a significant 
Sex difference in reading achievement test 
performance. Also, by correlating achieve- 
ment test scores with cloze scores, an addi- 
tional check on the validity of the cloze was 
Provided. Finally, the achievement test 
Scores were used to determine whether the 
effect of interest is different for high- versus 
low-achieving students. 
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METHOD 


Subjects 


Eighty-seven fifth-grade children, 49 boys and 
38 girls, participated in the study. Fifth graders 
served as subjects since recent evidence (Asher & 
Gottman, 1973) indicates that sex differences are 
likely to be found in reading achievement at this 
grade level, The children, with the exception of 
four who were repeatedly absent, constituted the 
entire fifth-grade population of a Champaign, 
Illinois, public school. Most were from middle- 
class homes. The children’s average IQ on a school- 
administered Scholastic Testing Service test was 
109. 


Materials 


Interest slides. Color slides were made by photo- 
graphing pictures from magazines and books. Pic- 
tures were selected to represent a wide range of 
interests for boys and girls. To ensure that each 
picture represented a single theme, the slides were 
shown to 34 fifth-grade children in another school. 
Children were asked to identify the topic of each 
slide; a naive judge sorted the topics assigned to 
each slide. Pictures used in the present study 
were given similar topics by 75% or more of the 
children. The 25 picture topics are listed below. 
This order of presentation was randomly selected. 


. Forest 

. Jet Airplane 

. Priest 

Dogs 

. Astronaut 
Bride 

Calf 

. Basketball Players 
. Butterflies 

. Marionettes 

. Monkey 

. Flowers 

. Bullfighting 

. Skiing 

. Food 

. Living Room 
. Maps 

. Painting 

. Circus 

. Race Cars 

. Canoe 

. Model Trains 
. Mother and Child 
. Insects 

. Cats 


5BESBBEEBSCENZSSREERTBESGO»uoneww- 


Reading material. Twenty-five passages cor- 
responding to the various picture topics were 
selected from the Britannica Junior Encyclopaedia 
(1970). This source is written for elementary 
school children in the fourth grade or older 
(Walsh, 1973) and provides a wide range of topics 
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in a more consistent style than would be obtained 
from diverse sources. The material was selected 
by a person who was unaware of the purpose of the 
experiment. He was given the list of picture topics 
and asked to locate a passage for each of the topics. 
Ten words were deleted from each passage. A 10- 
space line replaced each deleted word. The first 
deletion was the ninth word in the passage; sub- 
sequent deletions were every fifth word, One entire 
sentence appeared after the final deletion. 

Since the children did not choose their own 
reading material, it is unlikely that the objective 
difficulty of passages would vary by sex or interest 
condition. To test the hypothesis that passages 
were of similar difficulty across conditions, read- 
ability coefficients (Dale & Chall, 1948) were cal- 
culated. The Dale-Chall readability coefficient. is 
a function of the average sentence length of the 
passage and the number of words in the passage 
not found on a list of familiar words. The average 
paragraph readability raw score was 6.53; this cor- 
responds to a seventh-grade equivalent score. High- 
and low-interest readability coefficients were cal- 
culated for each child by summing the raw scores 
for the three high- and three low-interest passages. 
A 2 X 2 (Sex X Interest) analysis of variance 
performed on summed readability raw scores in- 
dicated no significant sex (F = .13, df = 1/85), 
interest (F = 3.18, df = 1/85), or sex by interest 
(F = 130, df = 1/85) effects. Thus, the objective 
reas of paragraphs did not vary across con- 

ations. 


Procedure 


The interest and comprehension tests were 
administered in two separate sessions one week 
apart, The children were tested in classrooms dur- 
ing their reading period. To minimize the possi- 
bility that children would perceive the connection 
between the two activities, different experimenters 
conducted the two sessions. This procedure was 
apparently effective; in discussions with children 
e MERE only one child inquired 
about the relationshi i 
és oe p between the pietures and 


Interest Assessment 


Experimenter 1 told the children, "I'd li 

find out about what kids are iie p 
going to show you 25 slides. For each slide I'd like 
you to mark, on the sheets we'll give you, how 
interesting the picture is to you. Who knows what 
‘interesting’ means?" After a few children had 
responded, Experimenter 1 summarized their com- 
ments by saying, "So, something is interesting 
when you like it and would like to find out more 
about it." Experimenter 1 then distributed to each 
subject a form with twenty-five 1-7 rating scales, 
and drew a 1-7 scale on the blackboard. At the low 
end of each scale were the words "not at all in- 
teresting" and, at the high end, "very interesting." 
The nature and use of the rating scale were ex- 
plained: 
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If a picture is very interesting to you—if you 
like it very much and want to know more about 
it—mark a number at this end of the scale. [The 
experimenter pointed to Numbers 5, 6, and 7 of 
the scale on the blackboard.] You can mark it 
with a circle, an X, a check, or whatever you 
want. If a picture is not at all interesting to 
you—if you don't like it and wouldn't care to 
find out more about it—mark a number at the | 
low end of the scale. [The experimenter pointed 
to the Numbers 1, 2, and 3 of the scale.] If the 
picture is of medium interest to you—if you like 
it but don't like it a lot—mark a number here. 
The experimenter pointed to Numbers 3, 4, and 
5.] Let's try an example for practice. If I showed 
a picture of a pile of dollar bills, what number 
would you choose? [The experimenter called on 
several students.) If I showed a picture of a piece 
of dirt, what number would you choose? [The 
experimenter again called on several students} 
So you can see that different people are im- | 
terested in different things. If anyone has any 
questions, raise your hand and I'll try to answer 
them. [Experimenter 1 then presented the slides 
announcing the number of each one as it was 
projected.] Here's Picture Number 1... Here's 
Picture Number 2..., etc. 


The slides were presented at the rate of ap- 
proximately one every 10 seconds. When all slides 
had been rated, the subjects were asked to write 
their names on their rating sheet. 

Reading comprehension task. 
the interest assessment, Experimenter 2 gave the 
children six passages to read. Three of the passage 
topics corresponded to a child’s three highest 0- 
terest ratings and three to his or her lowest inte 
ratings. When slides shared the third highest 0r 
third lowest rating, topics were randomly seec | 
from those sharing equal ratings. Each of the in 
passages, appropriately titled in upper-case P 
ters, was mimeographed on 8⁄2 X 11 inch pue 
and enclosed in a legal-size envelope. The & 
velopes were numbered from one to six 


specify the order in which passages should be Eo 


One week after 


Forty-three of the children, randomly ir 
read the passages in a high-low-high-low-hig a 
interest sequence while the other 44 children i^ 
them in a low-high-low-high-low-high d 
The particular positions of the three high- ino 
three low-interest passages within a addi- 
arrangements were randomly determined. ived 8 
tion to these six envelopes, each child rt H 
seventh envelope which contained the 8X n 
preference rating scales. 

Before the subjects were given 
Experimenter 2 gave the following in: 


the envelop 
structions: 


I am going to show you a reading be [pr 
perimenter 2 gave each child & sam ovo 
graph.] This is a paragraph with some and | 
missing. The idea is to read the pa 

decide what words are missing. Each ook tt 
has 10 missing spaces. Take a minu 
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the paragraph. [The experimenter paused.] OK. 
Now I'l read the paragraph with all of the 
words in it. You follow along with me. [The 
experimenter read the sample paragraph aloud, 
collected the sample paragraph from each sub- 
ject and then gave each subject the test 
envelopes.] 

You now have seven envelopes. Six have 
paragraphs in them. Start with the first para- 
graph and try to fill in all of the missing words. 
When you are done with a paragraph, put it 
back in the envelope and put it aside on your 
desk, Then you can go on to the second 
envelope; then the third, fourth, fifth, and 
sixth. Once you put a paragraph in the envelope 
you can’t go back. Do you have any questions? 

OK. Read each paragraph carefully and try 
to fill in the missing words. I can’t help you 
read any of the words, but if you have trouble 
spelling any words raise your hand and I will 
help. Spelling doesn’t count in this game. If you 
are having trouble, don’t get stuck. Go on to 
the next part of the paragraph or a new para- 
graph, You have 40 minutes for the six para- 
graphs. That should be plenty of time. Any 
questions? 

When you are done with the six paragraphs, 
open the seventh envelope. It contains some 
questions about how much you want to read 
more about each of the topics. If you would 
like to read more about it, circle one of the high 
numbers. If you wouldn’t like to read more 
about it, circle one of the low numbers. You 
can circle one of the numbers in the middle if 
that's how you feel. Got the idea? Any ques- 
tions? OK. You can begin. 


When each subject was finished, Experimeter 
2 collected the material and unobtrusively recorded 
the time. The average time for completing the 
task was 16 minutes, 

Cloze-scoring method. Children received cloze 
Scores based on the number of correct words 
Supplied for each high- and low-interest passage. 
Only the exact word was accepted as a correct 
Tesponse. The acceptance of synonyms does not 
Increase the validity of the cloze procedure (Bor- 
muth, 1965), and it decreases scoring objectivity 
and efficiency. Responses were considered to be 
Correct despite spelling errors if the supplied word 
was clearly recognizable as the deleted word. 


RESULTS 


Reading Achievement Test 


The Scholastic Testing Service test had 
been administered by the school system one 
month prior to the study. Data were avail- 
able for all but two students. The reading 
comprehension score for boys was 22.54 and 
for girls was 28.68; this difference is statisti- 
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cally significant (t = 7.14, df = 83, p < 
-01). Achievement test scores also were cor- 
related with the total number of deletions 
correctly supplied on the cloze test. The 
relationship between the achievement and 
cloze measures was significant (r = .69, 
df = 83, p < .01). This is consistent with 
previous evidence that the cloze procedure 
is a valid measure of reading comprehension. 


Picture Ratings 


Since the purpose of the study was to 
compare boys and girls on high- and low- 
interest material, it was important to ensure 
that the highest three and lowest three pic- 
ture ratings were similar for each sex. For 
each child, the highest possible combined 
rating for his or her three most interesting 
pictures is 21. The lowest possible combined 
rating for his or her three least interesting 
pictures is 3. For boys, the average combined 
rating for the three most preferred pictures 
was 20.90 and for girls it was 20.95 (t = .76, 
df = 85). The average combined rating on 
the three lowest rated pictures was 3.96 for 
boys and 4.15 for girls (t = .45, df = 85). 
Thus, the pictures provided very interesting 
and very uninteresting topics for individuals 
of both sexes. 


Preference Ratings 

If the picture assessment technique is 
valid, it should predict children’s reading 
preferences. Table 1 presents the data on 
children’s desire to read more about high- 
and low-interest topics after having read all 
six passages. Given three 1-7 rating scales 
for the high-interest material and three for 
low-interest material, the reading preference 
scores could range from 3 to 21 for each 
level of interest. 


TABLE 1 
READING PREFERENCE RATINGS 


Sex of student 


Interest level Boys Girls 
M SD M SD 
High 15.63 4.84 14.66 4.59 
Low 4.20 7.84 4.10 
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TABLE 2 
Croze READING SCORES 
Sex of student. 
Interest level Boys Girls 

M SD M SD 
High 9.33 5.32 10.71 4.48 
Low 5.57 3.91 9.32 4.49 


A2x 2 (Sex x Interest) analysis of vari- 
ance was performed on these data. Results 
indicated that children significantly pre- 
ferred the high-interest material (F = 
206.37, df = 1/85, p < .01), and there was 
no significant difference between boys’ and 
girls’ ratings (F = .05, df = 1/85). Further- 
more, the interaction of sex and interest was 
nonsignificant (F = 2.16, df = 1/85). Both 
boys and girls, then, expressed significantly 
greater preference for reading the material 
that corresponded to their high-interest 
areas. These results validate the use of the 
picture-interest assessment method since 
picture ratings predicted the reading pre- 
ferences for both sexes. 


Cloze Scores 


Table 2 presents the reading comprehen- 
sion data. Each child received high- and 
low-interest cloze scores based on the num- 
ber of his or her correct responses. Given 
three high- and three low-interest para- 
graphs and 10 deletions per paragraph, the 
high- and low-interest cloze scores could 
each range from 0 to 30. A 2 x 2 (Sex x In- 
terest) analysis of variance indicated that 
girls read significantly better than boys 
(F = 833, df = 1/85, p < .01) and children 
comprehended more of the high- than low- 
interest paragraphs (F = 38.81, df = 1/85). 
Both of these main effects, however, are 
qualified by a significant Sex x Interest in- 
teraction (F = 7.17, df = 1/85, p < .01). 

As Table 2 shows, the effect of interest 
was stronger for boys than for girls. Tukey’s 
honestly significant difference test (Kirk, 
1968) was used for comparisons between 
groups. Boys’ high-interest performance was 
significantly superior to their low-interest 
performance (p < .01). Girls’ performance, 
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however, was not significantly affected by 
the interest level of the material. The sex 
differences in reading comprehension also 
varied with the interest level of passages, 
On low-interest paragraphs, boys scored 
significantly below girls (p < .01). However, 
the two groups were not significantly dif- 
ferent on high-interest materials. Sex differ- 
ences in reading performance, then, de- 
pended on children’s interest in the material, 

One interpretation of the interaction of 
sex and interest level is that the effect of 
interest is stronger for children who are 
lower achieving. Since boys in the present 
study scored lower than girls, the relevant 
sex-related variable may be achievement 
level. To examine this interpretation, the 
sample was divided at the median based 
on the Scholastic Testing Service reading 
comprehension score. Table 3 presents data 
on the effects of interest on high- and low- 
achieving students. The effect of interest on 
reading comprehension is similar in both 
high- and low-achieving groups. A two-way 
(Achievement Level x Interest) analysis 
of variance indicated that high-achieving 
children comprehended more than low- 
achieving children (F = 29.90, df = 1/83, 
p < .01) and that comprehension Was 
greater on high-interest than low-interes 
materials (F = 35.79, df = 1/83, p < 0: 
Most important, the interaction of Achieve: 
ment Level x Interest was nonsignificant 
(F = .00, df = 1/83). Thus, the Sex X In- 
terest interaction obtained in the preceding 
analysis is not a function of differential im- 
pact of interest on high- versus low-achiev- 
ing students. The reading comprehensi 
of both high- and low-achieving studen 
was facilitated by high-interest material. 


TABLE 3 
Croze REApiNG SCORES BY 


ACHIEVEMENT LEVEL = 


Achievement level 


Interest level High achievers Low achievers 
sD 
SD M 
- — um 
High 12.09 | 3.91 | 7-32 | 38 


4.18 


Low 9.34 3.97 . 
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In summary, both girls and boys pre- 
ferred to read more about the material cor- 
responding to their high-interest topics. 
There was also a significant effect of in- 
terest on reading comprehension. However, 
the effect of interest on comprehension was 
stronger for boys than girls. The sex differ- 
ence in reading performance was significant 
n low-interest material but not on high- 
nterest material. Finally, the effect of in- 
terest was similar for high- and low-achiev- 
ing students. 


DISCUSSION 


Results of this study indieate that pre- 
vious generalizations about the inferior 
reading performance of boys need qualifica- 
tion. When boys were interested in the 
material they read as well as girls, while 
lack of interest produced results similar to 
those reported in earlier studies. Boys’ per- 
formance appears to have been particularly 
facilitated by the high-interest material, 
- while girls comprehended nearly as much 
of the low-interest material as the high. 
From these findings, it appears that when 
à boys read more poorly than girls, it was be- 
cause of low motivation rather than lack of 
skill. Since boys read as well as girls on the 
high-interest material, it indicates that they 
had the ability to comprehend material if 
they were motivated. 

What is needed is a motivational account 
which explains why boys were facilitated by 
the high-interest material and girls were 
telatively unaffected by the interest level 
of material. One potential explanation is 
based on theory and data concerning the 
Telationship between children’s sex role 
Standards and their reading performance. 
Kagan (1964) has noted a discrepancy be- 
tween boys’ sex role standards and the sex 
Tole connotations of school and reading. 

here is evidence that boys and girls per- 
ceive reading as a feminine activity (Kagan, 
1964; Stein & Smithells, 1969). If reading is 
Sex-appropriate for girls, they may be likely 
to read well regardless of the interest level 
of the task, If reading is sex-inappropriate 
for boys, they may require the additional 

Incentive provided by high-interest material 
to read well. The three pictures most in- 
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teresting to boys were race cars, basketball 
players, and astronauts. Topics such as these 
may make reading more “masculine.” Stein, 
Pohly, and Mueller (1971) have found that 
boys' performance is particularly affected 
by the sex-typing of an activity. 

An alternative to this motivational ex- 
planation is the possibility that the inter- 
action of sex and interest results from girls’ 
greater familiarity with vocabulary in a 
wider range of topic areas. If girls read more 
widely than boys, as some have suggested 
(e.g., Furness, 1963), then they might per- 
form similarly on high- and low-interest 
passages, while boys might perform better 
on high-interest material containing more 
familiar vocabulary. The notion that girls' 
wider reading activity extends to low-in- 
terest topics seems somewhat implausible 
in light of the current finding that girls, like 
boys, showed little enthusiasm for the low- 
interest material. Still, the familiarity-of- 
vocabulary interpretation merits considera- 
tion. It is currently being evaluated by 
controlling the vocabulary of passages while 
manipulating the topics. If the familiarity 
interpretation is correct, comprehension of 
vocabulary-controlled passages should be 
similar across interest level for both boys 
and girls. 

Another area for further research is the 
effect of interest on reading comprehension 
at different levels of passage difficulty. The 
passages used in the present study were 
somewhat difficult. The Dale-Chall formula 
indicated that the average passage was 
about two grades above the grade place- 
ment of children in the study. Still, children 
supplied the correct response for 29% of the 
cloze deletions, a percentage similar to that 
obtained when fourth- and seventh-grade 
children are given material selected by 
teachers to be appropriate to their grade 
level (Hansen & Hesse, 1974). Other studies 
indicate that a cloze score of 29% corre- 
sponds to approximately 65% correct on a 
multiple-choice achievement test (Bormuth, 
1967; Rankin & Culhane, 1969) and to 
considerable pretest to posttest information 
gain on a multiple-choice test (Bormuth, 
1968). Thus, children appear to have been 
deriving information from reading the pas- 
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sages used in the present study. Future 
studies might systematically vary passage 
difficulty in order to study the influence of 
this factor. 

A related issue is the effect of interest on 
sex differences in reading at different ages. 
Fifth-grade children were tested in the 
present study. Sex differences in reading 
achievement narrow between elementary 
and high school grades. Perhaps fifth-grade 
boys are transitional in their development 
and are more likely to be influenced by the 
interest level of the material than they 
would be at an earlier or later period. 

Finally, the present study has implica- 
tions for reading skill assessment and in- 
struction. The fact that sex difference found 
on the school-administered achievement test 
and the low-interest cloze passages, but not 
on the high-interest cloze material, implies 
that achievement test performance occurs 
under low-interest conditions. Achievement 
tests may not provide the kind of material 
that encourages boys to demonstrate their 
abilities, An interesting and critical issue is 
whether high-interest material would have 
long-term instructional benefits as well as 
short-term assessment affects. Stanchfield 
(1967) has reported success in teaching 
reading with high-interest materials. How- 
ever, in that study, the use of interesting 
material was one of a number of instruc- 
tional interventions. Furthermore, reading 
material in that study was the same for all 
children rather than being based on individ- 
ual interests. It remains to be determined 
whether the intervention of an individual- 
ized high-interest reading curriculum would 
narrow the gap between boys’ and girls’ 
reading performance. The interest assess- 
ment procedure used in the present study 
could easily be adapted to help individualize 
classroom reading programs. Results of the 
procedure predicted the reading preferences 
of boys and girls. 


REFERENCES 


Asher, S. R., & Gottman, J. M. Sex of teacher and 
student reading achievement. Journal of Educa- 
tional Psychology, 1973, 65, 168-171. 

Bernstein, M. R. The relationship between in- 
terest and reading comprehension. Journal of 
Educational Research, 1955, 49, 283-288. 


STEVEN R. ASHER AND RICHARD A. MARKELL 


Blom, G. E. Sex differences in reading disabil 
In E. Calkins (Ed.), Reading forum. Beth 
Md.: National Institute of Neurological 
ease and Stroke, 1971. ^ 

Blom, G. E., Waite, R. R., & Zimet, S. Cont 
of first grade reading books. The Read 
Teacher, 1968, 21, 317-323. 

Bormuth, J. R. Validities of grammatical 
semantic classifications of cloze test scot 
Proceedings of the International Reading | 
sociation, 1965, 10, 283-286. 

Bormuth, J. R. Comparable cloze and multij 
choice comprehension test scores. Journa 
Reading, 1967, 10, 291-299. 

Bormuth, J. R. Empirical determination of 
instructional reading level. Proceedings of W 
International Reading Association, 1968, 13, Tl 
721. ! 

Britannica Junior Encyclopedia. Chicago: Eng 
clopedia Britannica, 1970. a 

Dale, E, & Chall, J. S. A formula for predicti 
readability: Instructions. Educational Resear 
Bulletin, 1948, 27, 37-54. } : 

Furness, E. L. Researches on reading intere 
Education, 1963, 84, 3-7. 

Hansen, L. H., & Hesse, K. D. An assessment 0 
reading literacy. Madison, Wisconsin: Madi 0 
Public Schools, Department of Research and Di 
velopment, 1974. E 

Kagan, J. The child's sex role classification 9 
school objects. Child Development, 1964, 35 
1051-1056. h 

Kirk, R. E. Ezperimental design: Procedures or 
the behavioral sciences. Belmont, Calif.: Brooks 
Cole, 1968. E — — 

Klein, H. A. Interest and comprehension 1n 895 
typed materials. Paper presented at the In 
national Reading Association Conference, Ka 
City, May 1969. ^ 

Neale, D. C., Gill, N., & Tismer, W: Relation? 
between attitudes toward school subjegts.. 
school achievement. Journal of Educat 
search, 1970, 63, 232-237. 

Rankin, E. F., & Culhane, 
cloze and multiple-choice test scores. 
Reading, 1969, 13, 193-198. 

Shnayer, S. W. Some relationships between 
ing interests and reading comprehen 
published doctoral dissertation, Univers! | 
California, Berkeley, 1967. E. | 

Stanchfield, J. M. The effect of high 
materials on reading achievement m arbook, 
grade. National Reading Conference Ye 

tral 


1967, 16, 58-61. 
Stein, A. H., Pohly, S. R., & Mueller, 
fluence of masculine, qu and neul! : 
on children's achievement behavior, &^*7 
of success, and attainment values. Child De 
ment, 1971, 42, 195-207. " 
Stein, A. H., & Smithells, J. Age an% 5. haul 
ences in children's sex-role stan 


J. W. Comparabl 
Journal 0 


SEX DIFFERENCES IN COMPREHENSION OF READING MATERIAL 687 


achievement. Developmental Psychology, 1969, measuring readability. Journalism Quarterly, 
1, 252-259. 1953, 30, 415-433. 
Stroud, J. B., & Lindquest, E. F. Sex differences in Walsh, S. P. General Encyclopedias in Print 1973- 
| achievement in the elementary and secondary 1974: A Comparative Analysis. New York: R .R. 
schools. Journal of Educational Psychology, 1942, Bowker, 1973. 


33, 657-667. 
Taylor, W. L. “Cloze procedure:” A new tool for (December 17, 1973) 


Journal of Educational Psychology 
1974, Vol. 66, No. 5, 688-701 


ESTIMATING CAUSAL EFFECTS OF TREATMENTS IN 
RANDOMIZED AND NONRANDOMIZED STUDIES 


DONALD B. RUBIN* 
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A discussion of matching, randomization, random sampling, and other 
methods of controlling extraneous variation is presented. The objective 
is to specify the benefits of randomization in estimating causal effects 
of treatments. The basic conclusion is that randomization should be 
employed whenever possible but that the use of carefully controlled 
nonrandomized data to estimate causal effects is a reasonable and nec- 


essary procedure in many cases. 


Recent psychological and educational 
literature has included extensive criticism 
of the use of nonrandomized studies to 
estimate causal effects of treatments (e.g., 
Campbell & Erlebacher, 1970). The im- 
plication in much of this literature is that 
only properly randomized experiments can 
lead to useful estimates of causal effects. If 
taken as applying to all fields of study, this 
position is untenable. Since the extensive 
use of randomized experiments is limited to 
the last half century; and in fact is not 
used in much scientific investigation today, 
one is led to the conclusion that most 
scientific “truths” have been established 
without using randomized experiments. In 
addition, most of us successfully determine 
the causal effects of many of our everyday 
actions, even interpersonal behaviors, with- 
out the benefit of randomization. 

Even if the position that causal effects of 
treatments can only be well established from 
randomized experiments is taken as ap- 
plying only to the social sciences in which 


*I would like to thank E. J, Anastasio, A. E. 
Beaton, W. G. Cochran, K. M. Kazarow, and 
R. L. Linn for helpful comments on earlier ver- 
sions of this paper. I would also like to thank the 
U. S. Office of Education for supporting work on 
this paper under contract OEC-0-71-3715. 

* Requests for reprints should be sent to Donald 
B. Rubin, Division of Data Analysis Research, 
Educational Testing Service, Princeton, New Jer- 
sey 08540. 

* Essentially since Fisher (1925). 

*For example, in Davies (1954), a. well-known 
textbook on experimental design in industrial work, 
randomization is not emphasized. 
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there are currently few well-established 
causal relationships, its implication—to 
ignore existing observational data—may. be 
counter-productive. Often the only im- 
mediately available data are observational 
(nonrandomized) and either (a) the cost of 
performing the equivalent randomized ex- 
periment to test all treatments is prohibitive 
(e.g., 100 reading programs under study); 
(b) there are ethical reasons why the treat: 
ments cannot be randomly assigned (eg 
estimating the effects of heroin addiction oni 
intellectual functioning); or (c) estimates 
based on results of experiments vu 
delayed many years (e.g., effect of child: 
hood intake of cholesterol on longevity) 
In cases such as these, it seems more vs 
able to try to estimate the effects of the 
treatments from nonrandomized s 
than to ignore these data and dream sain” 
ideal experiment or make “arme Iy- 
decisions without the benefit of data RC 
sis. Using the indications from we 
domized studies, one can, if pe d 
initiate randomized experiments for seal 
treatments that require better estima 
that look most promising. ". 
'The position here is not that ron 3 
tion is overused. On the contrary, BV. 4. 
choice between the data from a ran monis 
experiment and an equivalent the dat 
domized study, one should choose ; 
from the experiment, especially in. 
sciences where much of the var! ; 
often unassigned to particular cause at non- 
ever, we will develop the position Ei a 
randomized studies as well as T8P' 
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experiments can be useful in estimating 
causal treatment effects. 

In order to avoid unnecessary complica- 
tion, we will restrict discussion to the very 
simple study consisting of 2N units (e.g., 
subjects), half having been exposed to an 
experimental (E) treatment (e.g, à com- 
pensatory reading program) and the other 
half having been exposed to a control (C) 
treatment (e.g., a regular reading program). 
If Treatments E and C were assigned to the 
2N units randomly, that is, using some 
mechanism that assured each unit was 
equally likely to be exposed to E as to C, 
then the study is called a randomized ex- 
periment or more simply an experiment; 
otherwise, the study is called a nonran- 
domized study, a quasi-experiment, or an 
observational study. The objective is to 
determine for some population of units (e.g., 
underprivileged sixth-grade children) the 
“typical” causal effect of the E versus C 
treatment on a dependent Variable Y, 
where Y could be dichotomous (e.g., suc- 
cess-failure) or more continuous (e.g., score 
on a given reading test). The central ques- 
tion concerns the benefits of randomiza- 
tion in determining the causal effect of the 
E versus C treatment on Y. 


Deriine rue CAusaL Errect or THE E 
versus C TREATMENT 
Intuitively, the causal effect of one treat- 
ment, E, over another, C, for a particular 
unit and an interval of time from t; to t; is 
the difference between what would have 
happened at time t» if the unit had been 
exposed to E initiated at t; and what would 
have happened at t; if the unit had been 
exposed to C initiated at tı: “If an hour ago 
I had taken two aspirins instead of just a 
glass of water, my headache would now be 
gone,” or “Because an hour ago I took two 
aspirins instead of just a glass of water, my 
eadache is now gone.” Our definition of the 
Causal effect of the E versus C treatment will 
Teflect this intuitive meaning, 
First define a trial to be a unit and an 
p ociated pair of times, t; and t», where 
denotes the time of initiation of a treat- 
nb. and tz denotes the time of measure- 
E ent of a dependent variable, Y, where 
zs < t». We restrict our attention to Treat- 
ents E and C that could be randomly 
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assigned; thus, we assume (a) a time of 
initiation of treatment can be ascertained 
for each unit exposed to E or C and (b) 
E and C are exclusive of each other in the 
Sense, that a trial cannot simultaneously 
be an E trial and a C trial (ie., if E is 
defined to be C plus some action, the 
initiation of both is the initiation of E; if E 
and C are alternative actions, the initiation 
of both E and C is the initiation of neither 
of ai but rather of a third treatment, E 
+ €). 

Now define the causal effect of the E 
versus C treatment on Y for a particular 
trial (i.e., a particular unit and associated 
times tı, te) as follows: 


Let y(E) be the value of Y measured! at 
tz on the unit, given that the unit re- 
ceived the experimental Treatment E 
initiated at t1; 

Let y(C) be the value of Y measured at 
t» on the unit given that the unit re- 
ceived the control Treatment C initiated 
at ti; 

Then y(E) — y(C) is the causal effect of 
the E versus C treatment on Y for that 
trial, that is, for that particular unit and 
the times ti, t». 


For example, assume that the unit is a 
particular child, the experimental treat- 
ment is an enriched reading program, and 
the control treatment is a regular reading 
program. Suppose that if the child were 
given the enriched program initiated at time 


*The measured value of Y stated with reference 
to time te is considered the "true" value of Y at ts. 
This position can be justified by defining Y by a 
measuring instrument that always yields the mea- 
sured Y (eg. Y is the score on a particular IQ 
test as recorded by the subject’s teacher). Since 
an “error” in the measured Y can only be detected 
by a “better” measuring instrument (eg, a 
machine-produced score on that same IQ test), 
the values of a “truer” score can be viewed as the 
values of a different dependent variable, Clearly, 
any study is more meaningful to the investigator 
if the dependent variable better reflects underlying 
concepts he feels are important (e.g. is more ac- 
curate) but that does not imply he must con- 
sider errors about some unmeasurable “true score.” 
For the reader who prefers the concept of such 
errors of measurement, he may consider the follow- 
ing discussion to assume negligible “technical 
errors” so that Y is essentially the “true” Y 
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ti, 10 days later at time t; he would have a 
score of 38 items correct on a reading test; 
and suppose that if the child instead were 
given the regular program initiated at time 
ti, at time t; he would score 34 items cor- 
rect. Then the causal effect on the reading 
test for that trial (that child and times tı, 
te) of the enriched program versus the 
regular program is 38 — 34 = 4 more items 
correct. 

The problem in measuring y(E) — y(C) 
is that we can never observe both y(E) and 
y(C) since we cannot return to time t; to 
give the other treatment. We may have the 
same unit measured on both treatments in 
two trials (a repeated measure design), but 
since there may exist carryover effects (e.g., 
the effect of the first treatment wears off 
slowly) or general time trends (e.g., as the 
child ages, his learning ability increases), 
we cannot be certain that the unit’s re- 
sponses would be identical at both times. 

Assume now that there are M trials for 
which we want. the “typical” causal effect. 
For simplicity of exposition, assume that 
each trial is associated with a different unit 
and expand the above notation by adding 
the subscript j to denote the j*^ trial (j = 
1, 2,---, M); thus y;(E) — y;(C) is the 
causal effect of the E versus C treatment 
for the j*^ trial, that is, the j*^ unit and the 
associated times of initiation of treatment, 
ti, and measurement of Y, ts). 

An obvious definition of the “typical” 
causal effect of the E versus C treatment 
for the M trials is the average (mean) causal 
effect for the M trials: 


y; (9 - O1 


Even though other definitions of typical 
are interesting,? they lead to more compli- 


"Notice that if all but one of the individual 
causal effects are small and that one is very large, 
the average causal effect may be substantially 
larger than all but one of the individual causal 
effects and thus not very "typical." Other possible 
definitions of the typical causal effects for the M 
trials are the median causal effect (the median of 
the individual causal effects) or the midmean 
causal effect (the average of the middle half of 
the individual causal effects). If the individual 
causal effects, y;(E) — y;(C), are approximately 
symmetrically distributed about a central value, 
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eations when discussing properties of esti 
mates under randomization. 
assume the average causal effect is the de 
sired typical causal effect for the M tri 
and proceed to the problem of its estim: 
tion given the obvious constraint that we 
ean never actually measure both y;(E) 
and y;(C) for any trial. 


RANDOMIZATION, MATCHING, AND 
ESTIMATING THE TYPICAL 
CAUSAL EFFECT IN THE 

2N Triax Srupy 


For now assume that the objective is to 
estimate the typical causal effect only for 
the 2N trials in the study. Of course, in 
order for the results of a study to be of much 
interest, we must be able to generalize to 
units and associated times other than those 
in the study. However, the issue of gen: 
eralizing results to other trials is discussed 
separately from the issue of estimating the 
typical causal effect for the trials under 
study. Also, for now we only consider the 
simple and standard estimate of the typical 
causal effect of E versus C: the average 
difference between those units who Te 
ceived E and those units who received ©: 
After considering this estimate when there 
are only two trials in the study and then 
when there are 2N (N > 1) trials in the 
study, we will more formally discuss two 
benefits of randomization. 


Two-Trial Study 


Suppose there are two trials under st 
one trial having a unit exposed to E an 
other having a unit exposed to 
typical causal effect for the two trial 


34 [y(E) — y«(C) + ya(E) — » Of 


The estimate of this quantity from b 
study, the difference between the E the 
Y for the unit who received E an Cis 
measured Y for the unit who received ta 
either 


udy, 


EE 


ls is 


yŒ) — y:(0) P 
ES Bl 
y«(E) — yi(C) E — 
will yield sim 


sensible definitions of “typical” 
values. 
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depending upon which unit was assigned E. 
Neither Equation 2 nor Equation 3 is neces- 
sarily close to Equation 1 or to the causal 
effect for either unit 


yi(E) — yx(C) [4] 
or 
y2(E) — y«(C), [5] 


even if these individual causal effects are 
equal. If the Treatments E and C were 
randomly assigned to units, we are equally 
likely to have observed the difference in 
Equation 2 as that in Equation 3, so that 
the average or “expected” difference in Y 
between experimental and control units is 
the average of Equations 2 and 3, 


7 lE) — ya(C)] + 1 [y(E) — y«(O)] 


which equals Equation 1, the typical causal 
effect for the two trials. For this reason, if 
the treatments are randomly assigned, the 
difference in Y between the experimental 
and control units is called an “unbiased” 
estimate of the desired typical causal effect. 
_ Now suppose that the two units are very 
Similar in the way they respond to the E 
and C treatments at the times of their trials. 
By this we mean that on the basis of 
extra information,” we know yi(E) is about 
equal to yo(E) and yi(C) is about equal to 
X(C); that is, the two trials are closely 
matched” with respect to the effects of the 
two treatments. It then follows that Equa- 
tion 2 is about equal to Equation 3, and 
both are about equal to the desired typical 
causal effect in Equation 1. In fact, if the 
two units react identically in their trials, 
Equation 5 = Equation 4 = Equation 3 = 
Equation 2 = Equation 1, and randomiza- 
tion is absolutely irrevelant. Clearly, having 
closely “matched” trials increases the 
closeness of the calculated experimental 
minus control difference to the typical causal 
effect for the two trials, while random as- 
Signment of treatments does not improve 
at estimate. 
Although two-trial studies are almost un- 
eard of in the behavioral sciences, they are 
not uncommon in the physical sciences. For 
example, when comparing the heat ex- 
Pansion rates (per hour) of a metal alloy in 
°xygen and nitrogen, an investigator might 
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use 2 one-foot lengths of the alloy. Because 
the lengths of alloy are so closely matched 
before being exposed to the treatment (al- 
most identical compositions and dimen- 
sions), the units should respond almost 
identically to the treatments even when 
initiated at different times, and thus the 
calculated experimental (oxygen) minus 
control (nitrogen) difference should be an 
excellent estimate of the typical causal 
effect, Equation 1. 

A skeptical observer, however, could al- 
ways claim that the experimental minus 
control difference is not a good estimate of 


` the typical causal effect of the E versus C 


treatment because the two units were not 
absolutely identical prior to the application 
of the treatments. For example, he could 
claim that the length of alloy molded first 
would expand more rapidly. Hence, he 
might argue that what was measured was 
really the effect of the difference in order of 
manufacture, not the causal effect of the 
oxygen versus nitrogen treatment. Since 
units are never absolutely identical before 
the application of treatments, this kind of 
argument, whether “sensible” or not, can 
always be made. Nevertheless, if the two 
trials are closely matched with respect to 
the expected effects of the treatments, that 
is, if (a) the two units are matched prior to 
the initiation of treatments on all variables 
thought to be important in the sense that 
they causally affect Y and (b) the possible 
effect of different times of initiation of 
treatment and measurement of Y are con- 
trolled, then the investigator can be con- 
fident that he is in fact measuring the 
causal effect of the E versus C treatment 
for those two trials. This kind of confidence 
is much easier to generate in the physical 
sciences where there are models that suc- 
cessfully assign most variability to specific 
causes than in the social sciences where often 
important causal variables have not been 
identified. 

Another source of confidence that the ex- 
perimental minus control difference is a good 
estimate of the causal effect of E versus C is 
replication: Are similar results obtained 
under similar conditions? One type of 
replication is the inclusion of more than two 
trials in the study. 
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The 2N Trial Study 


Suppose there are 2N trials (N > 1) in 
the study, half with N units having received 
the E treatment and the other half with M 
units having received the C treatment. The 
immediate objective is to find the typical 
causal effect of the E versus C treatment on 
Y for the 2N trials, say 7: 


r = shy 2 lB) - (OL 


Let Sg denote the set of indices of the E 
trials and So denote the set of indices 
of the C trials (S; U So = {i = L2, ---, 
2N}). Then the difference between the 
average observed Y in the E trials and the 
average observed Y in the C trials can be 
expressed as 


1 1 
Ja = Ve yi(E) — Wie yi(C), 


where P7js, and > jeso indicate, respec- 
tively, summation over all indices in Sg 
(i.e., all E trials) and over all indices in Sc 
(i.e. all C trials). We now consider how 
close this estimate Fa is to the typical causal 
effect 7 and what advantage there might 
be if we knew the treatments were randomly 
assigned. 

First, assume that for each unit receiving 
E there is a unit receiving C, and the two 
units react identically at the times of their 
trials; that is, the 2N trials are actually N 
perfectly matched pairs. We now show that 
the estimate fa in this case equals 7. ya can 
be expressed as the average experimental 
minus control (E — C) difference across 
the N matched trials. Since the (E — C) 
difference in each matched pair of trials is 
the typical causal effect for both trials of 
that pair, the average of those differences is 
the typical causal effect for all N pairs and 
thus all 2N trials. This result holds whether 
the treatments were randomly assigned or 
not. In fact, if one had N identically matched 
pairs, a “thoughtless” random assignment 
could be worse than a nonrandom assign- 
ment of E to one member of the pair and C 
to the other. By “thoughtless” we mean 
some random assignment that does not 
assure that the members of each matched 
pair get different treatments—picking the 
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N indices to receive E “from a hat” con: 
taining the numbers 1 through 2N rather 
than tossing a fair coin for each matched 
pair to see which unit is to receive E. | 

In practice, of course, we never have 
exactly matched trials. However, if matched 
pairs of trials are very similar in the sense 
that prior to the initiation of treatments the 
investigator has controlled those variable 
that might appreciably affect Y, then j, 
should be close to r. If, in addition, the 
estimated causal effect is replicable in the 
sense that the N individual estimated causa 
effects for each matched pair are very simi: 
lar, the investigator might feel even more 
confident that he is in fact estimating the 
typical causal effect for the 2N trials (e.g; 
2N children from the same school matched 
by sex and initial reading score into N pairs 
with the same observed E — C differente 
in final score in each matched pair). Simi 
larly, if the trials are not pair-matched bul 
are all similar (e.g., all children are males 
from the same school with similar pretesi 
scores) and if we observe that all yE) 
je&x are about equal and all y;(C) jo are 
about equal, the investigator would also 
feel confident that he is in fact estimating 
the typical causal effect for the 2N trials. - 

Nevertheless, it is obvious that if treat: 
ments were systematically assigned to units, 
the addition of replication evidence cannon 
dissuade the critic who believes the effect 
being measured is due to a variable used t0 
assign treatments (e.g., in the reading stu M 
if more active children always received » sl 
enriched program, or in the heat-expansion 
study, if the first molded alloy was alwaf 
measured in oxygen). If treatments WT 
randomly assigned, all systematic source 
of bias would be made random, and thus 
would be unlikely, especially if NE ae 
that almost all E trials would be wit dell 
more active children or the first M Je 
alloy. Hence, any effect of that var 
would be at least partially balanced m dd 
sense of systematically favoring neit a 
E treatment nor the C treatment 0" tions 
2N trials. In addition, using the replică 
there could be evidence to refute le e 
tic’s claim of the importance of that bs " 
(e.g., in each matched trial we get a 
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same estimate whether the more active child 
gets E or C). Of course, if we knew the 
skeptic’s claim beforehand, a specific control 
of this additional variable would be more 
advisable than relying on randomization 
(e.g., in a random half of the matched trials 
assign E to the more active child, and in 
the other half assign C to the more active 
child, or include the child's activity as a 
matching variable). 

It is important to realize, however, that 
whether treatments are randomly assigned 
or not, no matter how carefully matched 
the trials, and no matter how large N, a 
skeptical observer could always eventually 
find some variable that systematically 
differs in the E trials and C trials (e.g., 
length of longest hair on the child) and 
claim that ya estimates the effect of this 
variable rather than 7, the causal effect of 
the E versus C treatment. Within the experi- 
ment there can be no refutation of this 
claim; only a logical argument explaining 
that the variable cannot causally affect the 
dependent variable or additional data out- 
side the study can be used to counter it. 


Two Format BENEFITS OF 
RANDOMIZATION 


If randomization can never assure us that 
we are correctly estimating the causal effect 
of E versus C for the 2N trials under study, 
what are the benefits of randomization 
besides the intuitive ones that follow from 
making all systematic sources of bias into 
random ones? Formally, randomization 
provides a mechanism to derive probabilistic 
Properties of estimates without making 
further assumptions. We will consider two 
Such properties that are important: 


„ | The average E — C difference is an 
unbiased” estimate of 7, the typical 
causal effect for the 2N trials. 

2. Precise probabilistic statements can 
* made indicating how unusual the ob- 
Served E — C difference, ya, would be 
Under specific hypothesized causal effects. 


More advanced discussion of the formal 
me of randomization may be found in 
effé (1959) and Kempthorne (1952). 
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pier Estimation over the Randomization : 
et 


We begin by defining the “randomization 
set" to be the set of r allocations that 
were equally likely to be observed given 
the randomization plan. For example, 
if the treatments were randomly assigned 
to trials with no restrictions (the completely 
randomized experiment, Cochran & Cox, 


1957) each one of the n possible 


allocations of N trials to E and N trials to C 
was equally likely to be the observed alloca- 
tion. Thus, the collection of all of these 
r= des allocations is known as the 
randomization set for this completely ran- 
domized experiment. If the treatments were 
assigned randomly within matched pairs 
(the randomized blocks experiment, Cochran 
& Cox, 1957), any of the 2% allocations, 
with each member of the pair receiving a 
different treatment, was equally likely to be 
the observed one. Hence, for the experiment 
with randomization done within matched 
pairs, the collection of these r — 2* equally 
likely allocations is known as the randomiza- 
tion set. 

For each of the r possible allocations in 
the randomization set, there is a corre- 
sponding average E — C difference that 
would have been calculated had that alloca- 
tion been chosen. If the expectation (i.e., 
average) of these r possible average differ- 
ences equals r, the average E — C differ- 
ence is called unbiased over the randomiza- 
tion set for estimating r. We now show that 
given randomly assigned treatments, the 
average E — C difference is an unbiased 
estimate of r, the typical causal effect for 
the 2N trials. 

By the definition of random assignment, 
each trial is equally likely to be an E trial 
as a C trial. Hence, the contribution of the 
j trial (| = 1, +++, 2N) to the average 
E — C difference in half of the r alloca- 
tions in the randomization set is y;(E)/N 
and in the other half is — y;(C)/N. The 
expected contribution of the j'^ trial to the 
average E — C differenceis therefore 


My (E)/N] + M [— y(O/NI. 
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Summing over all 2N trials we have, the 
expectation of the average E — C difference 
over the r allocations in the randomization 
get is 


1 2N 
oN » [y;(E) — y;(C)], 
which is the typical causal effect for the 2N 


trials, 7. 

Although the unbiasedness of the E — C 
difference is appealing in the sense that it 
indicates that we are tending to estimate 
7, its impact is not immediately overwhelm- 
ing: the one E — C difference we have 
observed, ya, may or may not be close to 
T. In a vague sense we may believe ya 
should be close to 7 because the unbiased- 
ness indicates that “on the average” the 
E — C difference is 7, but this belief may 
be tempered when other properties of the 
estimate are revealed; for example without 
additional constraints on the symmetry of 
effects, the average E — C difference is not 
equally likely to be above 7 as below it. 

In addition, after observing the values 
of some important unmatched variable, we 
may no longer believe y4 tends to estimate 
7. For example, suppose in the study of 
reading programs, initial reading score is not 
a matching variable, and after the experi- 
ment is complete we find that the average 
initial score for the children exposed to E 
was higher than for those exposed to C. 
Clearly we would now believe that ya proba- 
bly overestimates 7 even if treatments were 
randomly assigned. 

In sum, the unbiasedness of the E — C 
difference for 7 follows from the random 
assignment of treatments; it is a desirable 
property because it indicates that “on the 
average” we tend to estimate the correct 
quantity, but it hardly solves the problem 
of estimating the typical causal effect. As 
yet we have no indication whether to believe 
Ja is close to r nor to any ability to adjust 
for important information we may possess. 


Probabilistic Statements from the 
Randomization Set 


A second formal advantage of randomiza- 
tion is that it provides a mechanism for 
making precise probabilistic statements 
indicating how unusual the observed E — C 
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difference, ya, would be under specifie | 
hypotheses. Suppose that the investigata 
hypothesizes exactly what the individual 
causal effects are for each of the 2N trials 
and these hypothesized values are 7; j = 
1, ++: , 2N. The hypothesized typical causal 
effect for the 2N trials is thus 


1 2N 


Having the 7; and the observed y;(E), jeg 
and y;(C), je&c, we can calculate hypothe) 
sized values, say ¥;(C) and ¥;(E), for all of 
the 2N trials. For jeS&, y;(E) is observed 
and y;(C) is unobserved; hence, for these 
trials $;(E) = y;(E) and $;(C) = y;(E) = 
f; For jeo, y;(C) is observed and yil 
is unobserved; hence, for these trials y;(C) = 
y\(C) and $,(E) = y;(C) + 7; Thus, we 
can calculate hypothesized y;(E) and y;(O) 
for all 2N trials, and using these, we cani 
calculate an hypothesized average E — UF 
difference for each of the r allocations of the 
2N trials in the randomization set. 
Suppose that we calculate the r hypothe- 
sized average E — C differences and] 
them from high to low, noting which 
E — C difference corresponds to the Se So 
allocation we have actually observed. 
difference, fa, is the only one which does nof 
use the hypothesized 7;. If treatments Mi 
assigned completely at random to the tri r 
and the hypothesized 7; are correct, any ona 
of the r — os differences was equalli 
likely to be the observed one; similarly, i 
treatments were randomly assigned Me 
matched pairs, each of the r = 2» di e 
with each member of a matched pair. p u 
a different treatment was equally likely 
be the observed one. Intuitively, ! 
hypothesized 7; are essentially corte 4 
would expect the observed difference $ diet 
be rather typical of the (r — 1) o voli 
ences that were equally likely to be 02? f the 
that is, fa should be near the Ge E 
distribution of the rE — C difiere 
1f the observed difference is in the at 
distribution and therefore not Mc cor 
the r differences, we might doubt 
rectness of the hypothesized 7; _ C dit 
Since the average of the r*, al causal 
ferences is the hypothesized typ!° 
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effect, 7, and the r allocations are equally 
likely, we can make the following probabilis- 
tic statement: 


Under the hypothesis that the causal effects are 
given by the 7;, j = 1,---, 2N, the probability that 
we would observe an average E — C difference 
that is as far or farther from 7 than the one we 
have observed is m/r where m is the number of 
allocations in the randomization set that yield 
E — C differences that are as far or farther from 
# than Ya. 


If this probability, called the "significance 
level" for the hypothesized 7;, is very small, 
we either must admit that the observed 
value is unusual in the sense that it is in the 
tail of the distribution of the equally likely 
differences, or we must reject the plausi- 
bility of the hypothesized z;. 

The most common hypothesis for which a 
significance level is calculated is that the 
E versus C treatment has no effect on Y 
whatsoever (ie., 7; = 0). Other common 
hypotheses assume that the effect of the 
E versus C treatment on Y is a nonzero 
constant (i.e., 7; = 7o) for all trials." 

The ability to make precise probabilistic 
Statements about the observed fa under 
various hypotheses without additional as- 
sumptions is a tremendous benefit of ran- 
domization especially since ya tends to 
estimate 7. However, one must realize that 
these simple probabilistic statements refer 
only to the 2N trials used in the study and 
do not reflect additional information (i.e., 
Other variables) that we may also have 
measured. 


PRESENTING THE RESULTS OF AN 
EXPERIMENT AS BEING OF 
GENERAL INTEREST 


Before presenting the results of an experi- 
Ment as being relevant, an investigator 
should believe that he has measured the 


"These hypotheses for a constant effect can be 
id to form “confidence limits" for 7. Given 
that the 7, are constant, the set of all hypothesized 
Te Such that the associated significance level is 
ae than or equal to a = m/r form a (1 — «) 

a idence interval for 7: of the r such (1 — «) con- 
ae intervals one could have constructed (one 
Eh of the r allocations in the randomization 
vul. (1 — a) = r — m of them include the true 

5 of 7 assuming all r; = r, See Lehmann (1959, 
P. 59) for the proof. 
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causal effect of the E versus C treatment 
and not the effect of some extraneous varia- 
ble. Also, he should believe that the result is 
applicable to a population of trials besides 
the 2N in the experiment. 


Considering Additional Variables 

As indicated previously, the investigator 
should be prepared to consider the possible 
effect of other variables besides those explicit 
in the experiment. Often additional variables 
will be ones that the investigator considers 
relevant because they may causally affect Y ; 
therefore, he may want to adjust the esti- 
mate y. and significance levels of hypotheses 
to reflect the values of these variables in 
the study. At times the variables will be 
ones which cannot causally affect Y even 
though in the study they may be correlated 
with the observed values of Y. An investiga- 
tor who refuses to consider any additional 
variables is in fact saying that he does not 
care if y4 is a bad estimate of the typical 
causal effect of the E versus C treatment 
but instead is satisfied with mathematical 
properties (i.e., unbiasedness) of the process 
by which he calculated it. 

Consider first the case of an obviously 
important variable. As an example, suppose 
in the reading study, with programs ran- 
domly assigned, we found that the average 
E — C difference in final score was four 
items correct and that under the hypothesis 
of no effects the significance level was .01; 
also assume that initial score was not a 
matching variable and in fact the difference 
in initial score was also four items correct. 
Admittedly, this is probably a rare event 
given the randomization, but rare events do 
happen rarely. Given that it did happen, 
we would indeed be foolish to believe ya = 4 
items is a good estimate of and/or the 
implausibility of the hypothesis of no treat- 
ment effects indicated by the .01 significance 
level. Rather, it would seem more sensible 
to believe that ya overestimates r and 
significance levels underestimate the plausi- 
bility of hypotheses that suggest zero or 
negative values for r. 

A commonly used and obvious correction 
is to calculate the average E — C difference 
in gain score rather than final score. That 
is, for each trial there is a “pretest” score 
which was measured before the initiation of 
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treatments, and the gain score for each trial 
is the final score minus the pretest score. 
More generally we will speak of a “prior” 
Score or "prior" variable which would have 
the same value, x;, whether the j'^ unit 
received E or C. It then follows given ran- 
dom assignment of treatments that the 
adjusted estimate (e.g., gain score) 


1 1 
N »» [y(E) — xi] — N x [y;(C) — xj] 


remains an unbiased estimate of r over 
the randomization set: Each prior score 
appears in half of the equally likely allo- 
cations as x;/N and the other half as 
— xj/N; hence, averaged over all alloca- 
tions, the j*^ prior score has no effect.’ But 
this result holds for any set of prior scores 
xj j = 1, +-+- , 2N, whether sensible or not. 
For example, in an experiment evaluating a 
compensatory reading program, with Y 
being the final score on a reading test, the 
prior variable “pretest reading score” or 
perhaps “IQ” properly scaled makes sense 
but “height in millimeters” does not. Also, 
why not use the prior variable “one half 
pretest score?” 

Clearly, in order to make an intelligent 
adjustment for extra information, we cannot 
be guided solely by the concept of unbiased- 
ness over the randomization set. We need 
some model for the effect of prior variables 
in order to use their values in an intelligent 
manner. For example, if the final score 
typically would equal the initial score if there 
were no E — C treatment effect (as with the 
length of the alloys in the heat expansion 
experiment), the gain score is perfectly 
reasonable. In the physical sciences, more 
complex models representing generally ac- 
cepted functional relationships are often 
used; however, in the social sciences there 
are rarely such accepted relationships upon 
which to rely. What then does the investiga- 
tor do in order to adjust intelligently the 
final reading scores for the subjects’ varying 
IQs, grade levels, socioeconomic status, and 


"If the prior score could vary depending on 
whether the unit received E or C (ie, it is a 
variable measured after the initiation of the treat- 
ment), we would have no assurance that the ad- 
justed E — C difference is an unbiased estimate 
over the randomization set. 
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so on? Apparently, he must be willing to 
make some assumptions about the func- 
tional form of the causal effect of these 
other variables on Y. If he assumes, perhaps 
based on indications in previous data, some 
“known” function for x; (e.g., in the com- 
pensatory reading program example, suppose 
x; equals [.01 X IQ} X pretest X [per- 
centile of family income]) so that x; is 
the same whether the j*^ unit received E or 
C, from the previous discussion the average 
E — C difference in adjusted scores remains 
an unbiased estimate of 7. If the investiga- 
tor assumes a model whose parameters are 
unknown and estimates these parameters 
by some method from the data, in general 
the average E — C difference in adjusted 
scores is no longer unbiased over the ran- 
domization set because the adjustment for 
the j*^ trial depends on which trials received 
E and which received C (e.g., in the analysis 
of covariance, the estimated regression 
coefficients in general vary over the r 
allocations in the randomization set). 
Hence, forming an intelligent adjusted 
estimate may not be simple even in a ran- 
domized experiment. 

Significance levels for any adjusted esti- 
mate can be found by calculating the ad- 
justed estimate rather than the simple 
E — C difference for each of the equally 
likely allocations in the randomization set. 
However, if the adjusted estimate does not 
tend to estimate 7 in a sensible manner, the 
resulting significance level may not be of 
much interest. 

Now consider a variable that is brought 
to the investigator’s attention, but he feels 
it cannot causally affect Y (eg. in the 
compensatory reading example, age of oldest 
living relative). Eventually a skeptic can 
find such a variable that systematically 
differs in the E trials and the C trials even 
in the best of experiments. Considering only 
that variable, it is indeed unlikely given 
randomization that there would be such 
discrepancy between its values in E trials 
and C trials, but its occurrence cannot be 
denied. If the skeptic adjusts ya by using à 
standard model (e.g., covariance), the 
adjusted estimate and related significance 
levels may then give misleading results 
(e.g, zero estimate of 7, hypothesis that 


ae 
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all causal effects are zero, 7; = 0, is very 
plausible). In fact, using such models one 
can obtain any estimated causal effect 
desired by searching for and finding a prior 
variable or combination of prior variables 
that yield the desired result. Such a search 
should be more difficult given that ran- 
domization was performed, but even with 
randomized data the investigator must be 
prepared to ignore variables that he feels 
cannot causally affect Y. On the other 
hand, he may want to adjust for such a 
variable if he feels it is a surrogate for an 
unmeasured variable that can causally 
affect Y (e.g., age of oldest living relative 
is a surrogate for mental stability of the 
family in the compensatory reading ex- 
ample). 

The point of this discussion is that when 
trying to estimate the typical causal effect in 
the 2N trial experiment, handling additional 
variables may not be trivial without a well- 
developed causal model that will properly 
adjust for those prior variables that causally 
affect Y and ignore other variables that do 

. not causally affect Y even if they are highly 
correlated with the observed values of Y. 
Without such a model, the investigator must 
be prepared to ignore some variables he 
feels cannot causally affect Y and use a 
Somewhat arbitrary model to adjust for 
those variables he feels are important. Àn 
example which demonstrates that it is not 
always simple to interpret significant results 
in a randomized experiment with many prior 
variables recorded is the recent controversy 
over the utility of oral-diabetic drugg.? 


Generalizing Results to Other Trials 


In order to believe that the results of an 
experiment are of practical interest, we 
generally must believe that the 2N trials 
in the study are representative of a popula- 
tion of other future trials. For example, if 
the experimental treatment is a compensa- 
tory reading program and the trials are 
composed of sixth-grade school children 
with treatments initiated in fall 1970 and Y 
measured in spring 1971, the results are of 
little interest unless we believe they tell us 
Something about future sixth graders who 


*See for example Schor (1971) and Cornfield 
(1971). 
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might be exposed to this compensatory 
reading program. 

For simplicity, assume the 2N trials in 
the study are a simple random sample from 
a “target population" of M trials to which 
we want to generalize the results; by simple 
random sample we mean that each of the 


e ways of choosing the 2N trials is 


equally likely to be selected. If T is the 
typieal (average) causal effect for all M 
trials, it then follows given random assign- 
ment of treatments that the average E — C 
difference for the 2N trials used is an un- 
biased estimate of T over the random 
sampling plan and over the randomization 
set. In other words, in each of the r X 
e ways of choosing 2N trials from M 
trials and then randomly assigning N trials 
to E and N trials to C, there is a calculated 
average E — C difference, and the average 


of these r X od differences is T: Be- 


cause of the randomization and random 
sampling, each trial is equally likely to be 
an E trial as a C trial and thus contributes 
y;(E)/N to the E — C difference as often 
asit contributes — y;(C)/N. It also follows 
that under a hypothesized set of causal 
effects, 7j, j = 1,---, M, the significance 
level (the probability that we would observe 
a difference as large as or larger than ya), 
given that we have sampled the 2N trials 
in the study, is m/r where m is the number 
of allocations in the randomization set that 
yield estimates as far or farther from 7 than 
ya 

If we let M grow to infinity (a reasonable 
assumption in many experiments when the 
population to which we want to generalize 
results is essentially unlimited, for example, 
all future sixth-grade students), some addi- 
tional probabilistic results follow. For ex- 
ample, the usual covariance adjusted esti- 
mate is an unbiased estimate of T (not 
necessari 7) over the random sampling 
plan and the randomization set, but whether 
the adjustment actually adjusts for the 


e Even though we have hypothesized 7; for all 
trials, we cannot calculate hypothesized ¥,(E) and 
M (C) for the unsampled trials, and thus the proba- 
Bee statement is conditional on the observed 
trials. 
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additional variables(s) still depends on the 
appropriateness of the underlying linear 
model. 

Hence, given random sampling of trials, 
the ability to generalize results to other 
trials seems relatively straightforward proba- 
bilistically. However, most experiments are 
designed to be generalized to future trials; 
we never have a random sample of trials 
from the future but at best a random 
sample from the present; in fact, experi- 
ments are usually conducted in constrained, 
atypical environments and within a re- 
stricted period of time. Thus, in order to 
generalize the results of any experiment to 
future trials of interest, we minimally must 
believe that there is a similarity of effects 
across time and more often must believe that 
the trials in the study are “representative” 
of the population of trials. This step of faith 
may be called making an assumption of 
“subjective random sampling" in order to 
assert such properties as (a) ya (or ya ad- 
justed) tends to estimate the typical causal 
effect T and (b) the plausibility of hypothe- 
sized f;, j = 1, --- , M, is given by the usual 
conditional significance level. 

Even though the trials in an experiment 
are often not very representative of the 
trials of interest, investigators do make and 
must be willing to make this assumption of 
subjective random sampling in order to 
believe their results are useful. When in- 
vestigators carefully describe their sample of 
trials and the ways in which they may differ 
from those in the target population, this 
tacit assumption of subjective random 
sampling seems perfectly reasonable, If 
there is an important variable that differs 
between the sample of trials and the popula- 
tion of trials, an attempt to adjust the 
estimate based on the same kinds of models 
discussed previously is quite appropriate. 


PRESENTING THE RESULTS OF AN 
NoNRANDOMIZED STUDY AS 
BEING oF GENERAL 
INTEREST 


The same two issues previously discussed 
as arising when presenting the results of an 


"See Cochran (1963) on regression and ratio 
adjustments. These are appropriate whether the 
sample is actually random or not. 
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experiment also arise when presenting the 
results of a nonrandomized study as being 
relevant. However, the first issue, the effect 
of variables not explicitly controlled, is 
usually more serious in nonrandomized than 
in randomized studies, while the second, 
the applicability of the results to a popula- 
tion of interest, is often more serious in 
randomized than in nonrandomized studies. 


Effect of Variables Not Explicitly Controlled 


In order to believe that y4 in a nonran- 
domized study is a good estimate of r, 
the typical causal effect for the 2N trials in 
the study, we must believe that there are no 
extraneous variables that affect Y and 
systematically differ in the E and C groups; 
but we have to believe this even in a ran- 
domized experiment. The primary difference 
is that without randomization there is often 
a strong suspicion that there are such varia- 
bles, while with randomization such suspi- 
cions are generally not as strong. 

Consider a carefully controlled nonran- 
domized study—a study in which there are 
no obviously important prior variables that 
systematically differ in the E trials and the 
C trials. In such a study, there is a real 
sense in which a claim of "subjective ran- 
domization" can be made. For example, if 
the study was composed of carefully 
matched pairs of trials, there might be a 
very defensible belief that within each 
matched pair each unit was equally likely to 
receive E as C in the sense that if you were 
shown the units without being told which 
received E, only half the time would you 
correctly guess which received E. Under 
this assumption of subjective randomiza- 
tion, the usual estimates and significance 
levels can be used as if the study had been 
randomized; this procedure is analogous to 
assuming subjective random sampling in 
order to make inferences about a target 
population. Until an obviously important 
variable is found that systematically differs 
in the E and C trials, the belief in subjective 
randomization is well founded. i 

Now consider a nonrandomized study in 


"Perhaps this is all that is meant by "ran- 
domization" to some Bayesians under any circum- 
stance (see Savage, 1954, p. 66). 
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which an obviously important prior variable 
is found that systematically differs in the E 
and C trials. We must adjust the estimate 
ya and the associated significance levels just 
as we would if the study were in fact a 
properly randomized experiment. An ob- 
vious way to adjust for such variables is to 
assume subjective randomization (i.e., the 
study was randomized and the observed 
difference on prior variables occurred “by 
chance"), and use the methods discussed in 
the previous section “Considering Addi- 
tional Variables" appropriate for an experi- 
ment (ie, gain scores, adjustment by a 
known function, covariance adjustment). 

The main problem with this approach is 
that having found an important prior varia- 
ble that systematically differs in the E and C 
trials, we might suspect that there are other 
such variables, while if the study were 
randomized we might not be as suspicious of 
finding these prior variables. Additionally, 
the various methods of adjustment that 
yield unbiased estimates given randomiza- 
tion have varying biases under different 
models without randomization. Even though 
an unbiased estimate is not (as we have 
seen) the total answer to estimating 7, it 
is more desirable than a badly biased esti- 
mate. Recent work on methods of reducing 
bias in nonrandomized studies is summarized 
in Cochran and Rubin (1974). Much work 
remains to be done, especially for many prior 
variables and nonlinear relations between 
these and Y. 

In sum, with respect to variables not 
explicitly controlled, a randomized study 
leaves the investigator in a more comfortable 
position than does a nonrandomized study. 
Nevertheless, the following points remain 
true for both: (a) Any adjustment is some- 
what dependent upon the appropriateness 
of the underlying model—if the model is 
appropriate the confounding effect of the 
prior variables is reduced or eliminated, 
while if the model is inappropriate a con- 
founding effect remains. (b) We can never 
know that all causally relevant prior varia- 
bles that systematically differ in the E and 
C trials have been controlled. (c) We must 
be prepared to ignore irrelevant prior varia- 
bles even if they systematically differ in E 
and C trials, or else we can obtain any 
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estimate desired by eventually finding the 
"right" irrelevant prior variables. 


Generalizing Results to Other Trials 


For almost any study to be of interest, 
the results must be generalizable to a popula- 
tion of trials. Typically, nonrandomized 
Studies have more representative trials than 
experiments since these are often conducted 
in constrained environments. Thus, if the 
choice is between a nonrandomized study 
whose 2N trials consisted of N representative 
E trials closely matched to N representative 
C trials and an experiment whose 2N trials 
were highly atypical, it is not clear which 
we should prefer; in practice there may be a 
trade-off between the reasonableness of the 
assumptions of subjective random sampling 
and subjective randomization (e.g., con- 
sider a carefully matched nonrandomized 
evaluation of existing compensatory reading 
programs and an experiment having these 
compensatory reading programs randomly 
assigned to inmates at a penitentiary). 

In a sense, all studies lie on a continuum 
from irrelevant to relevant with respect to 
answering a question. A poorly controlled 
nonrandomized study conducted on atypical 
trials is barely relevant, but a small ran- 
domized study with much missing data 
conducted on the same atypical trials is not 
much better. Similarly, a very well-con- 
trolled experiment conducted on a repre- 
sentative sample of trials is very relevant, 
and a very well-controlled nonrandomized 
study (e.g., E and C trials matched on all 
causally important variables, several control 
groups each with a potentially different 
bias) conducted on a representative sample 
of trials is almost as good. Typically, real- 
world studies fall somewhere in the middle 
of this continuum with nonrandomized 
studies having more representative trials 
than experiments but less control over prior 
variables. 


SuMMARY 


The basic position of this paper can be 
summarized as follows: estimating the 
typical causal effect of one treatment versus 
another is a difficult task unless we under- 
stand the actual process well enough to (a) 
assign most of the variability in Y to specific 
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causes and (b) ignore associated but causally 
irrelevant variables. Short of such under- 
standing, random sampling and randomiza- 
tion help in that all sensible estimates tend 
to estimate the correct quantity, but these 
procedures can never completely assure us 
that we are obtaining a good estimate of the 
treatment effect.* 

Almost never do we have a random sample 
from the target population of trials, and 
thus we must generally rely on the belief in 
subjective random sampling, that is, there 
is no important variable that differs in the 
sample and the target population. Similarly, 
often the only data available are observa- 
tional and we must rely on the belief in 
subjective randomization, that is, there is 
no important variable that differs in the E 
trials and C trials. With or without random 
sampling or randomization, if an important 
prior variable is found that systematically 
differs in E and C trials or in the sample and 
target population, we are faced with either 
adjusting for it or not putting much faith in 
our estimate. However, we cannot adjust 
for any variable presented, because if we 
do, any estimate can be obtained. 

In both randomized and nonrandomized 
studies, the investigator should think hard 
about variables besides the treatment that 
may causally affect Y and plan in advance 
how to control for the important ones— 
either by matching or adjustment or both. 
When presenting the results to the reader, 
it is important to indicate the extent to 
which the assumptions of subjective ran- 
domization and subjective random sampling 
can be believed and what methods of control 
have been employed.“ If a nonrandomized 
study is carefully controlled, the investigator 
can reach conclusions similar to those he 


" Even assuming a good estimate of the causal 
effect of E versus C, there remains the problem 
of determining which aspects of the treatments are 
responsible for the effect. Consider, for example, 
"expectancy" effects in education (Rosenthal, 1971) 
and the associated problems of deciding the relative 
causal effects of the content of programs and the 
implementation of programs. 

“Recent advice on the design and analysis of 
observational studies is given by Cochran in Ban- 
croft (1972). 
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would reach in a similar experiment.5 In 
fact, if the effect of the E versus C treatment 
is large enough, he will be able to detect it in 
small, nonrepresentative samples and poorly 
controlled studies. 

Basic problems in social science research 
are that causal models are not yet well 
formulated, there are many possible treat- 
ments, and in many cases the differential 
effects of treatments appear to be quite 
small. Given this situation, it seems reasona- 
ble to (a) search for treatments with large 
effects using well-controlled nonrandomized 
studies when experiments are impractical 
and (b) rely on further experimental study 
for more refined estimates of the effects of 
those treatments that appear to be im- 
portant. The practical alternative to using 
nonrandomized studies in this way is evalu- 
ating many treatments by introspection 
rather than by data analysis. 
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TO ELEMENTARY READING ABILITIES' 
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Good and poor readers from Grades 2 to 5 were compared on three 
cognitive style dimensions—conceptual style preferences, cognitive 
tempos, and attentional styles—which were assessed with the Con- 
ceptual Styles Test, Matching Familiar Figures Test, and Fruit Dis- 
traction Test. Attentional style measures distinguished poor and good 


readers better than the 


other cognitive style measures. However, 


another set of scores aligned with but not identical to the attentional 


style measures were even more hig! 


hly related to reading. These scores 


reflected children’s skills at sequentially transposing information from 


visual to verbal channels. The results are considered in light of the 


maturational lag hyopthesis; alternative accounts are proposed for 
the changing patterns of deficits between younger and older poor 


readers. 


Elementary grade children show dramatic 
developmental advances along several cog- 
nitive style dimensions (Goodenough & 
Eagle, 1963; Kagan, 1966; Santostefano & 
Paley, 1964). The aim of the present study 
was to evaluate the relationship of three 
such dimensions—conceptual style prefer- 
ences, cognitive tempos, and attentional 
styles—to reading abilities. 

Conceptual style preference (Kagan, 
1966) refers to the type of conceptual rela- 
tionships among objects typically formed 
by the child. Three such preferences that 
have been identified are analytic, relational, 
and inferential conceptualization. An ana- 
lytic concept is one which is formed on the 
basis of an objective element common to all 
members included in the group (e.g., they 
all have sharp points). A relational concept 
is based upon a functional relationship 


1 This study was supnorted by a National Insti- 
tute of Mental Health, Small Grant (MH 23656- 
01). The author acknowledges the assistance of Lee 
Smith, principal, and Jean Harmon, special reading 
instructor, Tongonoxie Public School, Tongonoxie, 
Kansas. The author also thanks Rosemary Wright, 
Brenda Morris, and Mark Devaney, who served as 
research assistants. 

? Requests for reprints should be sent to Doug- 
las R. Denney, Department of Psychology, Univer- 
sity of Kansas, Lawrence, Kansas 66045. 


established among the members (e.g., the 
match is used to light the pipe). An inferen- 
tial concept is based upon the members' be- 
longing to some superordinate class identi- 
fied by a lexical term (e.g., they are both 
tools). 

Increases in preferences for analytic and 
inferential concepts and decreases in rela- 
tional concepts with increasing age are re- 
ported among elementary children (Kagan, 
1966). Kagan has cited increases in the 
child’s tendency to analyze stimuli into 
their component parts as partly responsible 
for the increase in analytic conceptual style. 
Others have seen the shift from a focus upon 
“things that go together” to “things that are 
alike” as reflective of a more basic cognitive 
attainment by the child (Denney, 1971; 
Denney & Lennon, 1972). In either case, 
analytic and inferential conceptual fune- 
tioning may be related positively to read- 
ing ability since visual analysis of words and 
a focus upon words and word parts, which 
are alike and thus perhaps sound alike, 
would both seem to aid reading acquisition, 
particularly within phonemic instructional 
programs. 

The cognitive tempo dimension involves 
the child’s tendency to inhibit initial re- 
sponses and to reflect upon the accuracy of 
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his answer rather than responding impul- 
sively, in conditions of high response un- 
certainty. Developmental studies reveal a 
trend from impulsive to reflective cognitive 
tempos (Kagan, 1966). 

The identification of words and the read- 
ing of sentences by children just acquiring 
the ability to read represent situations of 
high response uncertainty within which 
cognitive tempos might be expected to have 
relevance. Using a sample of first-grade 
children, Kagan (1965) found positive cor- 
relations between indices of reflectivity and 
word recognition assessed both at the same 
time and six months after the cognitive 
tempo measures. Negative correlations were 
reported between some of the same indices 
of reflectivity and the errors committed dur- 
ing the oral reading of paragraphs a full 
year after the cognitive tempo measures 
were taken. Kagan recommended that mea- 
sures of cognitive tempo be incorporated in 
diagnostic batteries for the early identifica- 
tion of poor readers. 

Attentional style refers to the child’s 
ability to deploy his attention selectively, 
thereby avoiding distraction from intrusive 
and irrelevant stimulus information. Two 
such styles are constriction and flexibility, 
children with the flexible attentional style 
being less distracted by irrelevant stimuli. 
Santostefano and Paley (1964) showed de- 
velopmental advances along this dimension 
from constricted to flexible attentional 
styles, 

Effective reading would seem to neces- 
sitate the accurate deployment of attention 
and the exclusion of irrelevant and dis- 
tracting information either contained within 
the relevant stimuli (e.g., silent letters) or 
in close proximity to them (e.g., pictures 
and other words on the page). Santostefano, 
Rutledge, and Randall (1965) found con- 
striction-flexibility to be the only one of 
three cognitive style dimensions (the others 
being focusing-scanning and leveling— 
sharpening) to distinguish between poor 
and good readers. 

The present study constitutes an exten- 
sion of both Kagan’s (1965) and Santoste- 
fano et al.’s (1965) studies in that (a) anew 
cognitive style dimension, conceptual style 


703 


preferences, is added; (b) unlike Santoste- 
fano et al's study,- many specific reading 
skills are assessed separately and the rela- 
tionship of each to the cognitive style di- 
mensions is evaluated; and (c) unlike Ka- 
gan's study, the relevance of each cognitive 
Style dimension is investigated over the 
course of the several elementary grades dur- 
ing which reading ability is gradually being 
acquired. 

Reading deficiences have been thought 
to reflect maturational lags in the Central 
Nervous System, with poor readers mani- 
festing cognitive-perceptual behaviors simi- 
lar to those of younger normal children 
(Satz, Rardin, & Ross, 1971). It follows 
that the deficits distinguishing older poor 
readers from the age-mates should be differ- 
ent from those distinguishing younger poor 
readers from their age-mates. Since dra- 
matic attainments in perceptual-motor and 
attentional skills precede major shifts in 
language and conceptual-symbolic skills, 
deficits in the former skills should be more 
prevalent among younger poor readers and 
deficits in the latter skills more prevalent 
among older poor readers. A number of 
studies comparing poor and normal readers 
at more than one age level lend support to 
the above hypothesis (Braun, 1963; Satz 
et al., 1971). 

The present study employed children 
ranging from the second through the fifth 
grades and included measures reflecting a 
wide variety of cognitive-perceptual behav- 
iors. Thus, it was possible to evaluate the 
results of the study in terms of the shift 
from perceptual-motor and attentional def- 
icits to language, conceptual, and intel- 
lective deficits posited by the maturational 
lag hypothesis. 


METHOD 


Subjects 


The subjects were white, middle-class children 
in the second through the fifth grades of a small, 
rural elementary school. Ten children from each 
grade were designated poor readers on the basis of 
their reading teacher’s judgment; all 40 of these 
children were receiving weekly special reading in- 
struction. The majority of these children were 
boys; however, there was 1 girl from the second 
grade, 2 from both the third and fifth grades, and 
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3 from the fourth grade. Ten average or slightly 
above-average readers from each grade were also 
selected, again based upon the reading teacher's 
judgment. Highly advanced readers were excluded 
from the study. Within each grade level, poor and 
normal readers were matched for sex. Beyond these 
constraints, the selection of normal readers to 
compare with the poor readers was done randomly 
from a larger list of average or above-average 
readers. 


T'ests 


The 80 subjects were individually administered 
Form C of the Gilmore Oral Reading Test (Gil- 
more & Gilmore, 1968) and four subtests from the 
Gates-McKillop Reading Diagnostic Tests (Gates 
& McKillop, 1962). The Gilmore Oral Reading 
Test consisted of a series of paragraphs, graduated 
in difficulty, which the child read aloud. The ex- 
aminer tallied the number of reading errors com- 
mitted on each passage, the total being converted 
into an accuracy raw score. In addition, the child 
was asked three questions over the material in each 
passage. The total number of questions answered 
correctly for all passages became the child’s com- 
prehension raw score, Finally, the child’s reading 
rate in words per minute was also computed, 

The four subtests from the Gates-McKillop were 
the following: 

Word Pronunciation—Untimed Presentation 
(Whole Words). This test consisted of a list of 80 
words ordered in terms of difficulty. Two attempts 
to read each word were allowed, with points 
awarded for correct pronunciations during the first 
or second trials, 

Recognizing and Blending Common Word Parts 
(Word Parts). Twenty-three nonsense words, each 
composed of two common phonetic elements (e.g., 
SKED: SK, and ED), were presented. Four points 
were awarded for correctly reading the whole word 
on the first, trial; if this was not done, one point 
each was awarded for correctly reading the first 
and second elements and for correcly blending 
them in a final pronunciation of the word. 

Recognizing the Visual Forms or Word Equiv- 
alents of Sounds (Nonsense Words). After hear- 
ing each of 20 nonsense words, the child circled 1 
of 4 written words matching the sound he had just 
heard. 

Auditory Blending. The component sounds for 
15 common words were presented at one-fourth-of- 
a-second intervals. The child was awarded one 
point for each word correctly pronounced within 
two attempts. 

Subjects were also individually administered the 
following cognitive style tests. 

Conceptual Styles Test—Form A (CST-A). This 
test, adapted from Kagan's Conceptual Styles Test 
(Kagan, 1966), provided a measure of children's 
preferences for analytic, relational, and inferential 
concepts, The CST-A was composed of 15 items, 
each containing three pictures from which the sub- 
ject was instructed to choose two “that are alike 
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or go together in some way." Items were designed 
so that one possible pair represented an analytic 
concept and another represented a relational con- 
cept. The child's pointing response and accom- 
panying explanation were categorized as analytic, 
relational, or inferential (Denney, 1971). 

Matching Familiar Figures Test (MFFT). This 
test, designed by Kagan to assess the cognitive 
tempos of reflection and impulsivity, consisted of 
14 match-to-standard items, including 2 practice 
items. Each item consisted of a standard and 6 al- 
ternative stimuli exposed simultaneouslv. Latency 
to the first response and total number of errors 
were summed across all 12 test items. Both latency 
and error scores were used as indices of cognitive 
tempo (Kagan, 1966). 

Fruit Distraction Test (FDT). The third test 
was adapted from one designed by Santostefano 
and Paley (1964) to assess constricted and flexible 
attentional styles. The test consisted of three cards, 
each containing 50 pictures of bananas, cherries, 
grapes, and carrots ordered randomly on the card 
in 10 rows of 5. On Card 1, the fruits and vege- 
tables were colored appropriately (yellow, red, 
purple, and orange, respectively). Card 2 was iden- 
tical to Card 1 with the addition of a number of 
achromatic drawings of common objects inter- 
sversed among the relevant stimuli. Card 3 was 
identical to Card 1 except that the fruits and vege- 
tables were all colored inannropriately. On Cards 
1 and 2, the child was required to name the colors 
of each fruit and vegetable, and on Card 3, he was 
required to name the color that each fruit and 
vegetable should be, proceeding in all cases from 
left to right down each card. “Reading” times and 
numbers of corrected and uncorrected errors were 
recorded for each card. Following the reading of 
Card 2, the subject was asked to recall as many of 
the achromatic drawings as he could. Differences in 
reading times and errors between Card 2 and Card 
1, and between Card 3 and Card 1, and the num- 
ber of achromatic drawings recalled on Card 2 
constituted five indices of attentional style. 

Finally, Form A of the Peabody Picture Vo- 
cabulary Test (PPVT) was administered to each 
child. The PPVT is a test of receptive vocabulary 
correlating highly with other global measures of 
intelligence (Dunn, 1965). 

Three sessions were required to administer all 
the measures to each child. The first two sessions 
were conducted during the middle of the school 
year, with no more than two weeks separating the 
two sessions for any child. During the first session, 
a female examiner administered the Gilmore; dur- 
ing the second, a male examiner administered the 
CST-A, MFFT, FDT, and Gates-McKillop tests. 
The PPVT was administered by a female examiner 
during a third session held at the end of the school 
year. 


Design 


The study conformed to a 4 X 2 (Grade X 
Reading Level) factorial design with 10 subjects in 
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each cell. To control for intellectual differences 
among the subjects in the evaluation of the read- 
ing and cognitive style data, an analysis of co- 
variance was employed, using PPVT IQ scores as 
the covariate, Similarly, in correlating reading skill 
and cognitive style measures, intellective différ- 
ences were removed through partial correlation. 


RESULTS 


Analysis of the Reading Measures 


A 4 X 2 (Grade X Reading Level) analy- 
sis of covariance was performed on each of 
the seven reading scores derived from the 
Gilmore Oral Reading Test and the Gates- 
McKillop Reading Diagnostic Tests. Table 
1 presents the adjusted group means for 
both poor and good readers within each of 
the grades on the seven reading scores. 

Highly significant differences between 
poor and good readers were discovered on 
all seven reading measures with Fs signifi- 
cant at the .001 level for all measures except 
auditory blending: (Gilmore) accuracy (F 
= 124.34, df = 1/71, comprehension (F = 
16.87, df = 1/71), rate (F = 183.90, df = 
1/71); (Gates-McKillop) whole words 
(F = 26.62, df = 1/71), word parts (F = 
44.85, df = 1/71), nonsense words (F = 
52.23, df = 1/71), auditory blending (F = 
2.85, df = 1/71, p < .025). In addition, 
there were significant Grade x Reading 
Level interactions on five of these mea- 
sures: accuracy (F = 4.33, df = 3/71, p < 
01); rate (F = 2.86, df = 3/71, p < .05); 
word parts (F = 3.55, df = 3/71, p < .025) ; 
nonsense words (F = 5.64, df = 3/71, p < 
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.005) ; auditory blending (F = 2.85, df = 
3/71, p < .05). Separate comparisons of 
good and poor readers within each grade 
level were performed for these five measures 
to identify instances in which the differences 
between poor and good readers increased or 
decreased consistently across the grades. 
Such trends were found in the case of the 
accuracy and rate data. Although poor 
readers performed significantly lower than 
good readers in both accuracy and rate at 
each grade level (all ps < .001), the differ- 
ences tended to be greater within the latter 
two grades. 


Analysis of the Cognitive Style Measures 


As with the reading measures, the cogni- 
tive style measures were analyzed using a 
4 X 2 analysis of covariance, with PPVT IQ 
Scores as the covariate. Table 2 presents 
the adjusted group means for poor and good 
readers on 11 variables derived from the 
cognitive style measures. 

Analyses of the frequencies of analytic, 
relational, and inferential concepts on the 
CST-A revealed no significant results for 
reading level or Grade x Reading Level 
interaction. However, the interactions for 
both the analytic and the relational scores 
approached significance (analytic: F = 
2.57, df = 3/71, p < .10; relational: F = 
2.39, df = 3/71, p < .10), prompting specific 
comparisons for these scores. In the fifth 
grade, poor readers gave significantly fewer 
analytic responses (F = 4.77, df = 1/17, 
p < .05) and significantly more relational 


TABLE 1 


ApjUsTED GROUP MEANS FOR THE GILMORE ORAL READING TEST AND GarEs-McKirLoP READING 
Diagnostic Test Measures 


2nd 3rd 4th Sth 
Reading measure 
Poor Good Poor Good Poor Good Poor Good 
Gilmore 
Accuracy 5.3 26.7 16.6 38.6 15.4 50.1 20.4 57.5 
Comprehension 10.4 19.9 19.4 21.5 20.4 28.1 21.4 32.3 
Rate 23.4 105.8 46.6 109.7 43.9 122.9 53.5 154.9 
Gates-McKillop 
Whole Words 34.7 58.8 44.6 72.4 63.6 77.4 44.3 80.1 
Word Parts 4.0 72.8 54.4 82.4 51.2 90.0 47.2 93.2 
Nonsense Words 9.1 17.4 13.6 17.5 15.4 18.2 13.4 18.4 
Auditory Blending 12.2 | 14.0 12.0 13.8 9.5 14.7 14.2 14.3 
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TABLE 2 


Apivstep GROUP 


Means FOR ALL SCORES DERIVED FROM THE Coanitive STYLE MEASURES 


2nd ard 4th Sth 
TEM Good Poor Good Poor Good 
CRT-H 
9.2 
Analytic 9.6 7.6 6.8 8.3 6.9 4.2 4.2 A 
Relational 2.7 4.9 6.5 5.2 6.1 8.9 9.8 4.9 
Inferential 2.5 4.2 1.0 1.6 1.7 1.8 0.8 1.3 
MFFT 
Late 78.4 118.0 97.0 94.7 103.0 91.0 96.2 102.6 
consid 17.2 16.5 17.4 16.2 16.6 15.7 15.2 12.8 
Rd Time 1 131.5 92.7 110.2 104.9 115.7 73.1 97.7 68.5 
Error 1 0.7 1.4 0.4 0.4 2.5 0.4 0.2 0.9 
Rd Time 2 151.7 90.0 110.4 99.0 109.8 64.3 97.4 65.7 
Error 2 2.7 0.7 1.1 0.5 2.4 0.4 0.8 0.2 
Rd Time 3 200.6 153.6 170.2 179.8 | 217.8 119.9 129.7 121.3 
Error 3 1.8 0.4 0.6 1.0 3.7 0.5 1.5 1.2 


Note. Abbreviations: CST-A = Conceptual Styles Test- Form A; MFFT = Matching Familiar 
Figures Test; FDT = Fruit Distraction Test; Rd = Reading. 


responses (F = 4.51, df = 1/17, p < .05) 
than good readers, There were no differ- 
ences in the earlier grades. 

Analyses of the latency and error scores 
on the MFFT revealed no significant re- 
sults, 

Analyses of the difference scores between 
Card 2 and Card 1 on the FDT revealed a 
significant reading level main effect for the 
error difference scores (F = 5.24, df = 1/71, 
p < 05) and a significant Grade x Reading 
Level interaction for the reading time dif- 
ference scores (F = 3,28, df = 3/71, p < 
.05). There was also a significant Grade x 
Reading Level interaction for the reading 
time difference scores between Card 3 and 
Card 1 (F = 318, df = 3/71, p < .05). 
Separate comparisons performed on these 
interactions failed to show any consistent 
trends in the degree of disparity between 
good and poor readers across the four grade 
levels. No significant findings resulted from 
the analysis of the number of achromatic 
drawings recalled on Card 3. 

Sizable differences between poor and 
good readers were apparent in both reading 
times and errors on each individual card 
comprising the FDT, and analyses of co- 
variance were therefore performed on these 
data. Significant reading level main effects 
were found for Card 1 reading times (F = 
16.34, df = 1/71, p < .001), Card 2 reading 


times (F = 31.55, df = 1/71, p < .001), and 
errors (F = 6.30, df = 1/71, p < .025). 
Significant Grade x Reading Level inter- 
actions were found on Card 2 reading times 
(F = 4.71, df = 3/71, p < .01), Card 3 
reading times (F = 3.23, df = 3/71, p < 
05), and errors (F = 3.25, df = 3/71, p < 
.05). Separate comparisons performed on 
these interactions failed to reveal any 
trends in the degree of disparity between 
good and poor readers across the grade 
levels. 

Partial correlations were computed be- 
tween the cognitive style measures and the 
reading measures, for the total sample of 
80 children as well as for the 40 second- 
and third-grade children and the 40 fourth- 
and fifth-grade children. There were no 
significant correlations between the CST-A 
scores or the MFFT latency scores and the 
reading measures. MFFT error scores cor- 
related significantly with accuracy (r 
—.238), comprehension (r — 291), and 
rate (r — —.247) on the Gilmore, for the 
total sample but not for the separate sam- 
ples of older and younger children. 

Substantially larger correlations were 
found between FDT scores and the read- 
ing measures? Time difference and error 

*A table of correlations between the Fruit Dis- 


traction Test (FDT) and reading measures is avail- 
able from the author upon request. 


RELATIONSHIP OF COGNITIVE STYLE TO READING ABILITIES 


difference scores between Card 2 and Card 
1 were more highly correlated with the 
reading measures (average r — —.202) than 
were the corresponding difference scores 
between Card 3 and Card 1 (average r — 
—.161). Similarly, reading times and errors 
on Card 2 were more highly related to the 
reading measures (average r — —.298) than 
the corresponding scores on Card 3 (average 
r = —,202). Reading measures tended to 
correlate more highly with reading time and 
error scores on the individual cards than 
with time difference and error difference 
scores computed between cards. In partieu- 
lar, the reading time scores on Card 1 and 
Card 2 and the error scores on Card 2 were 
related to reading measures for the total 
sample. The reading time scores on Card 1 
correlated significantly with every reading 
measure except whole words (rs ranging 
from —.145 to —.465). The reading time 
scores and Card 2 correlated significantly 
with every reading measure (rs ranging 
from —.224 to —.549). The error scores on 
Card 2 correlated significantly with every 
reading measure except auditory blending 
and word parts (rs ranging from —.063 to 
—.420). For both the individual card scores 
and the difference scores on the FDT, cor- 
relations were consistently higher for 
younger children (average r = .235) than 
for older children (average r = .174). 


Discussion 


The analyses of the reading measures 
confirm substantial differences between the 
poor and good readers in all of their reading 
skills, The fact that the poor readers were 
receiving special reading instruction and 
were thus identified as poor readers to them- 
selves and their peers is a potential source 
of confounding in this study. This differ- 
ence between groups may have been am- 
plified by the stigma attached to the poor 
readers or may have been moderated by 
the special training they were receiving; 
however, this confounding was unavoidable 
given the available subjects. 

Conceptual style preferences generally 
failed to distinguish between poor and good 
readers, the only exception being for the 
fifth-grade children. In this case, poor 
readers tended to employ the development- 
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ally less advanced relational concepts rather 
than analytic concepts. 

Cognitive tempo data failed in all in- 
stances to distinguish between good and 
poor readers, Kagan’s (1965) findings were 
upheld only in the case of three significant 
but moderate correlations between MFFT 
errors and Gilmore Oral Reading Test 
scores. Eleven other correlations between 
reading and cognitive tempo measures failed 
to attain significance, raising serious doubt 
about the importance of this dimension to 
reading ability. 

In contrast to the conceptual style and 
cognitive tempo dimensions, substantial dif- 
ferences between good and poor readers 
were found along the attentional style 
dimension, However, the results contradict 
those of Santostefano et al. (1965). In the 
present study, difference scores between 
Card 2 and Card 1 successfully distin- 
guished good and poor readers, while in 
Santostefano's study, difference scores be- 
tween Card 3 and Card 1 were found to 
relate to reading. There appears to be no 
way to resolve these different outcomes. 
Together, the studies point up the necessity 
of assessing attention deployment in the 
face of distracting information both em- 
bedded within the relevant stimulus (Card 
3) and in close proximity to it (Card 2). 

In terms of developmental attainments 
along each cognitive style dimension, poor 
readers share with their normal peers the 
same orientations toward the similarities 
common to visual stimuli and the same 
abilities to analyze such stimuli into their 
component parts, skills which are clearly 
necessary in early reading acquisition. Poor 
readers also demonstrate the same concerns 
for the adequacy of their responses and the 
same tendencies to delay initial responses 
in order to reflect over their answers; with- 
out these attainments in reflection, initial 
attempts at reading would amount to noth- 
ing more than random guessing with too 
little time allotted between stimulus and re- 
sponse for the child to apply any of what 
he may have learned from his reading pro- 
gram. The present study indicates that poor 
readers' diffieulties lie not in the total 
amount of time they attended to particular 
problems but in the proportion of that time 
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spent productively examining the relevant 
stimuli in their visual field. Poor readers’ 
failures to focus upon the relevant stimuli 
not only hinder them in the act of reading 
itself (by distracting them from the word or 
phrase to be read) but also in the process of 
learning how to read. Staats, Brewer, and 
Gross (1970) have shown that differences in 
the learning curves for good and poor 
readers being taught the letters of the al- 
phabet through operant conditioning exist in 
the early trials; during such trials, the 
direction of attention to the learning task at 
hand is a most crucial factor affecting the 
shape of the learning curves. 

Reading time and error scores on the in- 
dividual cards of the FDT differentiated be- 
tween good and poor readers more effec- 
tively than any of the three cognitive style 
dimensions. The average correlation be- 
tween all difference scores and their re- 
spective individual card scores was .485, 
indicating the cognitive-perceptual behavior 
assessed by the individual card scores to be 
related but not identical to the dimension of 
constrietion-flexibility. Instead, the individ- 
ual card scores seem to reflect the child’s 
ability to process visually inputed informa- 
tion and to encode his response in a verbal 
channel. This sequential, visual-to-verbal 
transposition shares many similarities with 
the act of reading, and failures in inter- 
modal integration between information- 
processing channels have frequently been 
implicated in poor reading (Birch & Bel- 
mont, 1965; Bryden, 1972). 

There is some evidence of a shift from 
perceptual-motor and attentional deficits 
among younger poor readers to language 
and conceptual deficits among older poor 
readers, a finding consistent with the matu- 
rational lag hypothesis. Performance of 
the FDT was more highly related to read- 
ing scores for younger children, while scores 
on the CST-A and the PPVT were more 
highly related to reading scores for older 

_ children, However, the differences in the 
correlations for older and younger children 
are not large and the failure to find Grade 
x Reading Level interactions consistent 
with trends predicted from the maturational 
lag hypothesis must detract further from its 
support in the present study. 
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Even if the evidence in this study was 
more supportive of the predicted shift in 
deficits from younger to older poor readers, 
such a shift might be explained by learning 
rather than by a genetically determined and 
organically tied maturational lag. Failures 
to learn earlier skills may impede acquisi- 
tion of later skills. To date there is no com- 
pelling evidence that the attentional and 
visual-verbal integrative skills related to 
reading in the present study are genetically 
determined. 

There is yet another way to account for 
differences in the nature of the deficits 
shown by poor readers of various ages. Evi- 
dence of these differences is typically 
derived from cross-sectional studies of 
groups of poor readers and there may well 
be differences in the composition of these 
groups from one grade to the next. A fair 
number of poor readers respond quickly to 
early special reading instruction during the 
first two of three grades and perform at 
average reading levels thereafter. Others 
continue in special reading, profiting rela- 
tively little from these programs and thus 
falling further behind with each successive 
year. Groups of poor readers may therefore 
be composed of two types of children: (a) 
those who have relatively minor perceptual- 
motor and attentional deficits which re- 
spond to specialized instructional techniques 
and those with more severe cognitive-intel- 
lective and language dysfunctions which 
fail to respond to specialized reading and 
which indeed become aggravated as the 
child continues without the expanded ex- 
perience made available through the facility 
of reading. In subsequent, grade levels, the 
proportion of the former type of poor reader 
may decrease, while the deficits of the latter 
type of poor reader become more severe. 
Longitudinal studies are needed in further 
investigations of the maturation lag hy- 
pothesis. In addition, investigators might 
begin examining the ability to profit from 
reading instruction as a critical dimension 
distinguishing among poor readers. 
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Cognitive intervention may potentially be used to investigate both 
the cognitive processes underlying test performance and the possible 
incorporation of learning experiences into ability estimates. Inner-city 
high school students were randomly assigned to one of five experi- 
mental conditions or to the control group and were administered a 
test-intervention-retest sequence using verbal analogy items. It was 
found that although latent ability estimates may be significantly in- 
creased by a short intervention, modifiability depends on the type of 
intervention. Furthermore, it was found that general types of relation- 
ships may effectively mediate analogy item solving. 


The verbal analogy item has long been 
considered the best single type of item to 
measure general intelligence. The practical 
importance of the verbal analogy item is at- 
tested to by the popularity with which it is 
used in selecting individuals for special edu- 
cational curriculums, higher education, and 
training from a variety of ability tests (e.g., 
Scholastic Aptitude Test, School and Col- 
lege Ability Test, Miller Analogy Test). 
However, during the last several years, the 
lack of firm knowledge about the nature of 
what is measured by general intelligence 
tests (Ebel, 1973; MeNemar, 1964) has 
caused increasing discontent among psy- 
chologists and educators. The persistence of 
such doubts, despite the wealth of correla- 
tional validity data, has indicated to some 
methodologists (MeNemar, 1964; Messick, 
1972) the need for experimental data on the 
psychological processes used to solve intel- 
ligence test items. 

The nature of what is measured by intelli- 
gence tests is an especially pressing con- 
temporary problem because of claims that 
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the tests discriminate unfairly against cer- 
tain subcultural groups. The basic issue is 
whether intelligence test performance may 
be meaningfully compared between persons 
of different subcultural groups when their 
learning experiences may be quite different. 
As Holtzman (1971) has noted, one of the 
most volatile criticisms leveled against in- 
telligence testing is that the tests measure 
the values of the dominant culture rather 
than intellectual capacity or potential. Al- 
though the use of capacity tests which are 
blind to both situational and cultural fac- 
tors has been increasingly criticized by ma- 
jor theorists (e.g, Cole & Bruner, 1971), 
classical models of ability tests do not per- 
mit the incorporation of varied learning ex- 
periences into the estimation of intellectual 
potential. Thus, a new basie model of abil- 
ity and a new type of validity data are need- 
ed to answer pressing concerns about the 
nature of what is measured by intelligence 
tests. 

Cognitive learning treatments on test-re- 
lated material may potentially be used to 4 
investigate both the cognitive processes un- 
derlying test performance and the possible 
incorporation of learning experiences into 
ability estimates. Popular experimental 
methods for studying cognitive processes in 
verbal learning have either supplied medi- 
ators (Rohwer, 1971) or instructed subjects - 
to use strategies (Paivio & Foth, 1970) that 
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may potentially facilitate performance. In- 
tervention may be used in a similar way to 
study psychological processes in solving 
analogy items, by comparing the facilita- 
tion of item performance when supplying 
mediators or when instructing subjects to 
use specified strategies to solve items, 

Similarly, cognitive learning treatments 
may also potentially be used to correct abil- 
ity estimates for varied learning experi- 
ences. Whitely and Dawis (in press) expli- 
cated an aptitude-ability model of testing 
from a test-intervention-retest sequence. A 
basic assumption for the Whitely and 
Dawis model is that test score gain over in- 
tervention mirrors the effects of deficient 
previous learning experiences. The apti- 
tude-ability model uses a weighted com- 
bination of two scores, initial ability and 
test gain, to estimate true potential (apti- 
tude). Test gain functions as a suppressor 
variable in the model, which increases the 
correlation with true potential by partialing 
out some of the experience-specific variance 
from the initial ability measure. 

The current study has the dual purpose of 
investigating (a) the psychological processes 
in analogy test performance and (b) the 
modifiability of latent ability scores. Both 
purposes may be accomplished by experi- 
mentally studying the effects of different 
learning treatments on test performance. 
The magnitude and nature of the effects of 
the learning treatments are specially ex- 
amined for potential standardization into 
a test-intervention-retest sequence. 


METHOD 


Design and Subjects 


The subjects were 184 students randomly se- 
lected from class lists within two inner-city high 
schools located in St. Paul, Minnesota. The se- 
lected students were randomly assigned to one of 
six conditions. Each condition was administered 
once in each school, with the group size varying 
from 8 to 20 students each. The one-and-a-half- 
hour session consisted of an analogy item pretest, 
a treatment condition (or “filler” task), and an 
analogy item posttest, 


Test Materials 


One persistent problem in the measurement of 
change is the equivalency of test forms. For tests 
developed according to traditional measurement 
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models, test means, variances, and intercorrela- 
tions must be equal for difference scores to be 
meaningful. However, with the newer logistic mod- 
els of latent traits (i.e, Rasch, 1966), statistically 
equivalent forms may be obtained from any two 
subsets if item parameters are scaled and anchored 
according to the procedure described by Wright 
and Panchapakesan (1969). Furthermore, differ- 
ence scores between forms may be individually 
corrected for measurement error by computing a 
standardized difference score, Individuals with 
low-probability standardized differences (a z ratio 
comnuted by dividing the differences in Rasch 
ability scores between forms by the measurement 
error associated with each score) vary in ability 
more than can be expected from measurement er- 
ror alone. The standardized difference score per- 
mits greater objectivity in the measurement of 
ability change as compared to traditional ability 
difference scores, since test-form differences in dif- 
ficulty and precision at various score levels are 
statistically controlled. 

A set of 142 verbal analogy items was developed 
to fit the Rasch logistic model of test performance. 
A prime feature of the analogy items was an un- 
usually low vocabulary level (Lorge-Thorndike fre- 
quency appropriate for sixth grade level or less), 
so that item solving would maximally depend on 
subjects’ ability to educe relationships rather than 
on knowledge of words. A combined sample of 
2,517 students were used to calibrate and anchor 
the items according to the procedure outlined by 
Wright and Panchapakesan (1969). The 142 items 
were divided into the following mutually exclusive 
sets for the experimental materials: (a) a 41-item 
analogy pretest, (b) a 4l-item analogy posttest, 
(c) 50 items for the intervention, and (d) 10 in- 
structional examples. Although not required for 
equivalent test forms with items calibrated by the 
Wright and Panchapakesan (1969) procedure, 
items were matched for difficulty between forms 
to allow raw score comparisons and to equate 
measurement precision at the various ability levels. 


Experimental Conditions 


The experimental conditions consisted of five 
interventions and a control. In the control condi- 
tion, three tests* which are not highly correlated 
with analogy item performance were given as a 
“filler” task. The experimental conditions focused 
on practicing item solving, and the 50 intervention 
items were administered in a fixed order for all 
five experimental conditions. All items were ex- 
posed on a screen for 25 seconds each (10 seconds 
extra were given for feedback conditions) by & 
earousel slide projector. For two of the experimen- 
tal conditions, the 50 items were administered as 
a practice task only. In one practice condition, item 


* These tests were the Minnesota Clerical Test 
and two tests from the French Kit of Cognitive 
Reference Tests, the Apparatus Test and the Sur- 
face Development Test. 
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feedback was provided by a special answer book- 
let which displayed the correct answer after the 
subject had completed an item (P+). In the other 
practice condition, no feedback was given (P). 

In the other three experimental conditions, sev- 
eral units of instruction were interjected between 
subsets of the 50 intervention items. Although the 
content of the instruetion was constant, both the 
use of structural aids (described below) and feed- 
back varied between the three conditions with in- 
struction. The three conditions were defined as fol- 
lows: feedback but no structures (I+); structures 
but no feedback (I.); feedback and structures 
(I,+). In the conditions with feedback, I++ and 
I+, the correct answer was displayed in a special 
answer booklet, as in condition P+. 


Instructional Units 


The instruction was based mainly on Whitely's 
(1973) empirical study of the types of analogy re- 
lationships designated by a group of successful test. 
performers to describe the word-pair relationship. 
In the Whitely (1973) study, subjects sorted 60 
items into relationally homogeneous types accord- 
ing to their own implicit categories. Whitely (1973) 
identified the following eight latent categories of 
relationships from a latent partition analysis 
(Wiley, 1967) of the college students’ categoriza- 
tions: (a) similarities, (b) opposites, (c) word pat- 
tern, (d) class membership, (e) class naming, (f) 
conversion, (g) functional, and (A) quantity. 

All but two of the eight instructional units in 
the current study were designed to teach subjects 
to use the eight analogy relationships in solving 
items by presenting the type names and corre- 
sponding definitions followed by examples for each 
type. Each of the first six units were followed by a 
subset of the intervention items, with the types of 
relationships in the items varying according to the 
instructional unit (see Table 5 for set content). 

The six units on the relationship types were fol- 
lowed by two additional units of instruction for 
which no previous research was available. The 
seventh unit concerned the position of the related 
pair and blank in the analogy item stem, while the 
eighth unit of instruction concerned the role of re- 
sponse alternatives. 

Two types of structure were supplied for inter- 
vention items in L-- and I.. An example of a 
fully structured analogy item is presented in which 
the type of relationship is given and the arrows 
point to the related pair: 


Eat:Slow :: — : Fast 
1) Speed 2) Drive 3) Fast 4) Drink 5) Ate 
(Answer: 3) 


OPPOSITE 


‘The complete instructions are available upon 
request from René V. Dawis, Psychology Depart- 
ment, University of Minnesota, Minneapolis, Min- 
nesota 55455. 
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Both types of structure were given in the first four 
item subsets (arrows and relationship), while the 
remaining items had, at most, only one type of 
structure (see Table 5). 


RESULTS 


Two types of results were obtained. Data 
from pretest—posttest scores are relevant to 
the modifiability of ability estimates over 
various types of intervention. Data on the 
intervention items are more relevant to 
psychological process, since the concurrent 
effects of intervention can be analyzed. 


Analogy Test Scores 


The measurement of change has been so 
controversial that some methodologists have 
recommended not using gain scores (Cron- 
bach & Furby, 1970). However, gain has an 
irreplaceable role in the Whitely and Dawis 
(in press) model as a suppressor variable. 
Because of the controversy, several different 
variables which measure treatment effects 
were analyzed, including raw gain, Rasch 
gain, and the residualized posttest measure 
recommended by Cronbach and Furby 
(1970). Standard differences, a special fea- 
ture of Rasch-scaled test forms, were also 
analyzed to compare change scores to mea- 
surement error. The means and standard de- 
viations for all six dependent variables are 
presented in Table 1. 

Dunnett’s t statistic was used to detect 
the significance of the interventions to 
change latent ability scores, since the level 
of significance is protected over the set of 
comparisons of each experimental group to 
the no-intervention control. Table 2 pre- 
sents the t values for Dunnett’s statistic, 
comparing the analogy test means of each 
experimental group to the control group. 
None of the experimental groups differed 
from the control group on either the raw 
score or Rasch ability score derived from 
the analogy pretest. On the change mea- 
sures, raw gain score, Rasch gain score, and 
residualized posttest score, the only experi- 
mental group to perform significantly better 
than the control was I,+ (instruction with 
feedback and structure). Practice with feed- 
back, praetice without feedback, instruction 
with feedback only, and instruction with 
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TABLE 1 


Means AND STANDARD DEVIATIONS OF ÁNALOGY Test SCORES FOR THE 
Five EXPERIMENTAL CONDITIONS AND THE CONTROL GROUP 


Analogy Test score* 
Condition n" Raw pretest Raw gain Rasch pretest Rasch gain Residualized posttest 

x SD x SD x SD x SD x SD 

Cc 31 25.10 2.29 — .83 .35 —1.46 
9.04 5.15 1.25 .82 5.05 

32 25.19 3.53 —.17 .58 —.21 
9.24 3.92 1.28 57 3.71 

P+ 36 24.17 3.52 —.82 .74 —.81 
7.99 4.89 1.03 .86 5.01 

I+ 26 20.31 3.38 -47 .63 —.84 
9.83 5.86 1.43 75 5.76 

Ir 34 23.62 6.26 —.36 1.04 2.37 
9.31 5.47 1.28 48 5.14 

I, 25 24.20 4.00 —.15 Th .16 
7.91 4.51 1.10 67 4.60 


Note. Abbreviations: C = control, P = practice, P+ = practice with feedback, I+ = instruc- 
tion with feedback, I,+ = instruction with structure and feedback, and I, = instruction with struc- 
ture. 

^ The F values for Bartlett-Box test of equality of variance are as follows: raw pretest = .43, raw 
gain = 1.11, Rasch pretest = .80, Rasch gain = 1.29, and residualized posttest = 1.71. 


structure only failed to yield significantly 
higher change scores than the no-interven- 
tion controls. 

Table 3 presents the means, standard de- 
viations, and range of the standardized dif- 
ference scores for individuals in each group. 
The average of the Rasch gain between 
analogy tests exceeded the measurement 
errors associated with each score only in the 
instruction condition with both feedback 
and structure (z > 1.65, p < .05, one-tailed). 

Table 4 presents the correlations of pre- 
test scores, gain scores, and posttest scores. 
The pretest correlates highly with the post- 
test in all conditions for both raw and Rasch 


correlations may not be attributed to mea- 
surement error. 


Intervention Items 


To compare the treatment effects on the 
intervention items, two special features of 
the data had to be considered: (a) The 
items are highly intercorrelated; and (b) 
the intervention units dealt separately with 


TABLE 2 
Dunnett t VALUES FOR ÁNALOGY Test MEAN 
Scores or Five ExPERIMENTAL CONDITIONS AS 
COMPARED TO A CONTROL GROUP 


Score 


scores, indicating that subjects’ relative Experi- 

ability does not change drastically. The raw east zn 7i Rasch | Rasch | Residu- 
gain score correlated significantly with pre- "9? | pretest | gain | pretest | gain eha, 
test scores in only two conditions, P and 

Is+. Since McNemar (1969) has demon- x ae me a esl us 
strated that measurement error may produce n CERES ees EEE unen Vi 
significant negative correlations between Ld — 4s | 3.01* | —.50 | 3.38* | 3.06* 
gain and pretest scores, Rasch gain scores I, —.22 | 1.35 —.31 L51 | 1.31 


corrected for measurement error (stan- 
dardized difference scores) were correlated 
with the Rasch scores on the pretest. The 
significant negative correlation in I,+ and 
P (shown on Table 4) indicate that the 


Note. Abbreviations: P = practice, P+ prac- 
tice with feedback, I+ = instruction with feed- 
back, I,4- = instruction with structure and feed- 
back, and I, — instruction with structure. 

*p = 0l. 
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TABLE 3 
Means, STANDARD DEVIATIONS, AND RANGE OF 
STANDARDIZED DIFFERENCE SCORES IN 
Six CONDITIONS 


Standardized differences 
Group 

X SD Maximum | Minimum 
P 1.10 1.00 2.84 —.83 
P+ 1.10 1.18 3.91 —.93 
Ir 1.09 1.40 4.30 —2.58 
L1 1.75 1.23 4.65 —.28 
I, 1,22 1.13 3.30 —.93 
Cc 45 1.82 3.22 —2.10 


Note. Abbreviations: P = practice, P+ = prac- 
tice with feedback, I+ = instruction with feed- 
back, I+ = instruction with structure and 
feedback, I, = instruction with structure, and C 
= control. 


the various types of relationships which 
vary from item to item. Therefore, it was 
desirable to retain the items as separate de- 
pendent variables in the item subsets but, at 
the same time, to utilize the item intercor- 
relations in the treatment comparison sta- 
tistic. A multivariate analysis of variance 
(MANOVA) most adequately meets these 
requirements, although the results are some- 
what biased by the use of dichotomous var- 
iables. Table 5 presents the F values from 
the MANOVA on each dependent variable 
vector. The following set of orthogonal con- 
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trasts were computed: 
Effect P P+ I+ L+ L 

1. Instruction vs. prac- 

tice —3 -3 42 +2 +2 
2. Feedback in practice —1 +1 
3. Feedback in instruc- 

tion +1 +1 -2 
4, Structure in instruc- 

tion -1 +1 


With the total set of 50 items as the de- 
pendent variable vector, Table 5 shows that 
instruction yielded significantly better per- 
formance than practice. Similarly, feedback 
facilitated performance in both instruction 
and practice conditions while item structure 
did not have an overall effect on item per- 
formance. With the individual item subsets 
as dependent variable vectors, Table 5 
shows that instructional groups have supe- 
rior performance on all subsets except Item 
Set 2, class membership, class naming, and 
conversion relationships. Similarly, feed- 
back was associated with significantly higher 
probabilities of solving items on all sub- 
sets for instructional groups, with the excep- 
tion of Item Set 1. However, feedback fa- 
cilitated performance irregularly over item 
subsets in the practice conditions. 

A group which received structured items 
(I,--) performed better than the non- 
structured group (I+) on only three item 
subsets. All the significant subsets had the 


TABLE 4 
CORRELATIONS or ANALoGy TEST Scores IN Six EXPERIMENTAL CONDITIONS 
Variable ENLEIS 
S E P+ I+ L+ I 
n 31 82 E 
Pretest with posttest, raw 36 26 34 25 
score gate 9e A d 
Pretest with gain, raw 86 -83* .81** .87** 
score —.90 — .35* A 
Pretest with posttest, 07 -19 — .43°* .03 
Rausch scores .qor* .90** oa 
Pretest with standardized :83 Srt .84** .88** 
difference, Rausch scores —.26 — .3g** 10 2a Ena oi 


Note. Abbreviations: C = control, P = practice, P+ = practice with feedback, I+ = instruc- 


tion with feedback, I,+ = instruction with structure and feedback, and I, 


ture. 
* p significant at .05. 
** p significant at .01. 


instruction with struc- 


" 
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TABLE 5 


F VALUES FOR MULTIVARIATE ANALYSIS OF VARIANCE OF 
INTERVENTION, TOTAL, AND SUBSET VECTORS 


2 H Contrast effect. 
Item set ge. | Suppied | — af 
with items* Practice vs. | Feedback in | Feedback in | structure 
instruction | practice | instruction 
1. Similarities, opposites, 
and word pattern rela- 
tionships 5 RP 5/143 3.37** 1.24 2.16 2.29* 
2. Class membership, class - 
naming, and change 
into relationships 5 RP 5/143 73 2.26* 3.46** 1.36 
3. Functional and quality 
relationships 5 RP 5/143 3.13** 2.09 5.92** 2.40* 
4. Mixed relationships 10 RP 10/138 2:915 1.64 3.56** 2.07* 
5. Mixed relationships 5 P 5/143 5.15** 2.43* 5.15** 1.24 
6. Unidentified relation- 
ships 5 P 5/143 2.53* 1.95 2.42* .79 
7. Mixed relationships 10 R 10/138 1.90* 2.43** 5,75%* 84 
8. Mixed relationships 5 - 5/143 4.27** 3.31** 13.24** T9 
Total 50 pm 50/98 3.65** 1,01* 2.66** 1.01 


* Abbreviations: R = type of relationship is supplied and P = related pair is supplied. 


* p significant at .05. 
** p significant at .01. 


full structure supplied, that is, both type of 
relationship and related pairs. The only 
full structure subset which did not favor 
I,+, Item Set 2, was also the only subset for 
which instructional groups did not score 
higher than practice groups. Partial struc- 
ture, in Item Sets 5, 6, and 8, was associated 
with no significant differences. Item Set 8 
had no structure for any group and did not 


` vary significantly between groups. 


Discussion i 


The results indicated that experimental 
intervention can sometimes significantly in- 
crease individual ability scores, as esti- 
mated from performance on verbal analogy 
items. Pretest to posttest comparisons of 
three change measures—raw gain scores, 
Raseh-sealed gain score, and residualized 
posttest score—yielded significant ability 
inereases in only one of the five intervention 
conditions, instruction with both feedback 
and structure. Practice on the intervention 
analogy items, even when accompanied by 
feedback, had no significant effect in chang- 
ing ability scores. Similarly, neither instruc- 
tion with feedback alone, nor instruction 


with struetural aids alone, led to ability 
changes. 

The finding that only the most intensive 
intervention condition significantly changes 
ability seores may be more readily under- 
stood in comparison to previous attempts 
to study ability change over intervention. 
The British studies on the effects of coach- 
ing and practice on ability test scores, sum- 
marized by Vernon (1954), have been the 
most extensive research findings on ability 
as modified by test-related interventions. 
The British researchers were concerned with 
norm validity when the test preparation 
given to students varied between schools, 
and they devised a series of test-related in- 
structional units which were given to all 
students over several days. Similiarly, Ja- 
cobs and Vandeventer (1969) in the United 
States were able to modify Raven's Pro- 
gressive Matrices scores by a lengthy series 
of interventions. In contrast to these studies, 
the intervention in the present study was 
quite brief (a single 50-minute session), 
and the failure for the less intensive condi- 
tions to change ability scores may have re- 
sulted from the severe time restrictions. 
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The results obtained also have some im- 
plications from the use of modifiability in 
ability measurement, such as in the apti- 
tude-ability model (Whitely & Dawis, in 
press). First, ability scores can be signifi- 
cantly altered by an intervention which is 
short enough to standardize into a single 
testing session. Second, significant test gain 
may potentially funetion as a suppressor 
variable, as shown by its negative correla- 
tion with initial ability, even when corrected 
for measurement errors. These findings sup- 
port the feasibility of the aptitude-ability 
model in estimating true potential from a 
weighted combination of initial ability and 
test gain obtained in a test-intervention-re- 
test sequence. 

Some clues as to the psychological proces- 
ses involved in solving analogy items may 
be gleaned from the analysis of the inter- 
vention items as well as from the pretest- 
posttest results. Cognitively oriented in- 
struction, designed to teach the taxonomy of 
types of analogy relationships described by 
a test-sophisticated group, had an immedi- 
ate facilitatory effect on analogy item solv- 
ing. The facilitation of analogy item perfor- 
mance by general types of relationships is 
contrary to expectation if item solving de- 
pends on a simple associational process, as 
has been indicated by some previous re- 
search (Gentile, Kessler, & Gentile, 1969; 
Willner, 1964). Gentile et al. (1969) had 
suggested that the relationships used to 
solve analogy items were specific to both 
persons and items. The current study sug- 
gests that the use of an empirically derived 
relational taxonomy in item solving facili- 
tates performance. Apparently a more com- 
plex process than simple association under- 
lies analogy item solving. 

Interestingly, instruction was not effec- 
tive for all types of relationships. Instruc- 
tion significantly increased item perfor- 
mance for similarities, opposites, word pat- 
tern, functional and quantity relationships, 
while instruction in class membership, class 
naming, and conversion relationships had 
no apparent facilitatory effect on analogy 
item solving. Unfortunately, from the de- 
sign in the current study, it is impossible to 
separate the adequacy of the definition of 
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ineffective relationships from the degree to 
which special instruction in the relationship 
actually facilitates performance. 

Supplying cognitive mediators (i.e., type 
of relationship, related pair) with analogy 
items effectively facilitated item solving 
only when full structure was provided. 
Supplying type of relationship or related 
pair did not alone significantly improve 
item performance. Apparently the medi- 
ator must elicit the full association rather 
than simply reducing the associative pos- 
sibilities. With respect to Spearman's 
(1923) three principles of intelligence and 
cognition, when the full structure is pro- 
vided, the task for the subject is to educe à 
correlate for the remaining stem word by 
using the supplied relationship. However, 
when only partial structure is supplied, the 
subject still has the dual task of educing re- 
lationships and correlates. This is similar to 
the initial task, except that the associative 
possibilities would theoretieally be reduced. 

In summary, this study finds that modi- 
fiability of latent ability scores depends on 
type of intervention and that, minimally, a 
complex associational process is needed to 
account for the facilitation of analogy items 
by general types of relationships. 
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AUDITORY-VISUAL INTEGRATION AND READING 
PERFORMANCE IN LOWER-SOCIAL- 
CLASS CHILDREN 


GERALD W. JORGENSON! 


Case Western Reserve University 


The present study examined the relationship of auditory-visual inte- 
gration and reading performance in a sample on first- and second-grade 
subjects from primarily lower-class backgrounds. The effects of grade 
and single-modality auditory and visual 
possible moderating variables. The 


level, sex differences, IQ, 
functioning were examined as 


results suggested that auditory-visual 
was developmental ; however, the subjects functioned at a lower level 
than would be expected of middle-class subjects. Significant correla- 
tions were obtained between auditory-visual integration and reading 
vocabulary in the total sample and in the second-grade boys. Although 
neither IQ nor single-modality ability was a significant contributor 
to this relationship, the results suggested that grade level, sex differ- 
ences, and socioeconomic status did make significant contributions and 
should be considered when interpreting auditory-visual integration 


research. 


Since the original work of Birch and Bel- 
mont (1964, 1965), studies have demon- 
strated that a significant relationship exists 
between reading performance and the ability 
to judge the equivalence of auditory and 
visual stimuli (Beery, 1967; Ford, 1967; 
Kahn & Birch, 1968; Muehl & Kremenak, 
1966; Reilly, 1971; Sterritt & Rudnick, 
1966). With the possible exception of Reilly 
(1971), these studies have included primar- 
ily middle-class subjects. The present study 
examined the relationship of auditory—visual 
integration and reading performance in pri- 
marily lower-class first- and second-grade 
subjects to determine whether the conclu- 
sions of previous studies apply to this popu- 
lation. Within this general framework, four 
additional variables which may influence the 
relationship between auditory—visual inte- 
gration skill and reading performance were 
investigated: grade-level placement, sex- 
group membership, IQ, and single-modality 
functioning. The rationale for studying each 
of these variables is presented below. 


1 Requests for reprints should be sent to Gerald 
W. Jorgenson, Department of Education, Case 
Western Reserve University, Cleveland, Ohio 44106. 
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] integration ability in this sample 


Grade-Level Placement 


Birch and Belmont (1965) established 
that auditory-visual integration ability in- 
ereased most rapidly in children between 
kindergarten and Grade 2. Kindergarten 
children functioned at little better than a 
chance level, while an average second grader 
got 80% of the items on a 10-item test cor- 
rect. The authors found a correlation of .70 
(N = 29, p < .001) between auditory-visual 
integration and reading performance im 
Grade 1. At higher grade levels, the correla- 
tions were not significant. In addition, at 
each grade level except kindergarten and 
sixth grade, a significant positive relation- 
ship existed between auditory-visual inte- 
gration and IQ, with a range of coefficients 
from .34 to .57. 

Subsequent research (Kahn & Birch, 
1968), utilizing an expanded and more diffi 
cult auditory-visual integration test instru: 
ment, has indicated that auditory—visual im 
tegration continues to develop througho 
the elementary school years and continues 
be significantly related to reading perfo! 
ance at least through the sixth grade. How 
ever, the conclusion of Birch and Belm 
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that auditory-visual integration ability de- 
velops most rapidly between kindergarten 
and Grade 2 remains viable. Precisely when 
auditory-visual integration "peaks" during 
this time span, and when the ability is most 
related to reading performance, remains un- 
clear. 

Reilly (1971) found significant differences 
in auditory-visual integration performance 
between first- and second-grade subjects, 
concluding that better discrimination is as- 
sociated with older children. In first-grade 
subjects he found a correlation of .23 (N — 
60, p « .05) between auditory-visual inte- 
gration and vocabulary as measured by the 
Gates-MacGinitie Reading Test. Auditory— 
visual integration was not significantly re- 
lated to either reading comprehension or 
total reading score. In the second grade, all 
correlations were significant: auditory-vis- 
ual integration and vocabulary (N = 56, 
r = .65, p < .005), comprehension (N = 56, 
r = .71,p < .005), total reading score (N = 
56, r = .70, p < .005). Since auditory-visual 
integration accounted for 42% of the reading 
vocabulary variance and 50% of the reading 
comprehension variance in the second-grade 
subjects, Reilly concluded that it was at that 
level that auditory-visual integration was 
most powerfully related to reading ability. 
This conclusion is the opposite of the one 
suggested by Birch and Belmont, who found 
that auditory-visual integration accounted 
for 49% of the reading variance in first- 
grade subjects and 18% of the variance in 
second graders. Although these results could 
perhaps be explained by methodological 
differences in the two studies—Reilly used 
a 20-item auditory-visual integration test, 
rather than the 10-item Birch and Belmont 
test, as well as a different reading perform- 
ance measure—the differing conclusions led 
the present authors to examine the effect of 
grade-level placement on auditory-visual 
integration performance level and on the 
magnitude of the relationships between 
auditory—visual integration and reading per- 
formance. 


Sex Differences 


Muehl and Kremenak (1966) found that 
the sex of the subjects included in their sam- 
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ple of first-grade children was not signifi- 
cantly related to auditory-visual integration 
performance. Reilly (1971), on the other 
hand, found that the relationship of audi- 
tory—visual integration to reading perform- 
ance at different grade levels was affected by 
the sex of the subjects. For example, in first- 
grade boys, no significant correlation was 
found between auditory-visual integration 
and reading performance; in first-grade 
girls, a significant correlation existed be- 
tween auditory-visual integration and read- 
ing voeabulary. In the present study, sex 
differences were examined as possible moder- 
ating variables to see if Reilly's results 
could be replicated. 


IQ 

The relationship between auditory-visual 
integration and intelligence has been exam- 
ined in previous studies but with contradic- 
tory results. Birch and Belmont (1964, 1965) , 
Sterritt and Rudnick (1966), Muehl and 
Kremenak (1966), and Beery (1967) are 
among those who have found that auditory- 
visual integration significantly predicted 
reading performance, independently of the 
influence of IQ. Kahn and Birch (1968) 
found that controlling the effects of IQ did 
not effect the significant correlation of audi- 
tory—visual integration and word knowledge 
but reduced the correlation of auditory-vis- 
ual integration and comprehension in Grade 
1 to an insignificant level. Ford (1967), 
however, found that controlling for the ef- 
fects of IQ reduced the level of correlation 
between auditory-visual integration and all 
measures of reading performance to an in- 
significant level. Thus the question of 
whether auditory-visual integration is a 
significant predictor of reading performance 
when the effects of general intelligence are 
controlled remains unresolved. 


Single-Modality Ability 


There has been relatively little examina- 
tion in the literature of the effects of audi- 
tory and/or visual functioning on integrative 
ability. In terms of the effect of auditory 
functioning on auditory-visual integration 
performance, three studies (Birch & Bel- 
mont, 1964; Ford, 1967; Kahn & Birch, 
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1968) administered the digit span subtest of 
the Wechsler Intelligence Scale for Children 
(WISC) to all of a sample of the subjects. 
None of these studies found a significant re- 
lationship between short-term auditory 
memory and auditory-visual integration 
performance. Kahn & Birch (1968) added 
an additional auditory test, asking the 10 
highest and 10 lowest auditory-visual inte- 
gration scorers at each grade level tested to 
“sing” or “tap out” patterns which corre- 
sponded to the auditory component of the 
auditory-visual test. This measure did not 
discriminate between high- and low-audi- 
tory-visual scorers, and the authors con- 
cluded that single-modality auditory per- 
formance did not contribute significantly to 
integrative performance. 

The influence of visual functioning on 
auditory-visual integration performance has 
been investigated by Kahn and Birch 
(1968). As a measure of visual perception, 
the 10 highest and 10 lowest auditory-visual 
integration scorers at each grade level tested 
in the study were asked to choose, from 
among three choices, the visual dot pattern 
that matched a visual dot pattern cue. There 
were no significant differences between low- 
and high-auditory-visual scorers on this 
test, and the authors concluded that single- 
modality visual perception did not make a 
significant contribution to integration per- 
formance. 

These conclusions regarding the influence 
of single-modality abilities must be tenta- 
tive. Several of the studies employed ex- 
treme group procedures wherein samples 
were split into the highest and lowest audi- 
tory-visual integration scorers. Such proce- 
dures tend to attenuate relationships which 
might have emerged had the total sample of 
subjects been considered. Moreover, the na- 
ture of the single-modality tasks utilized in 

previous studies is questionable. Particularly 
in the Kahn and Birch study, the visual per- 
ception test was too easy, in that only one 
subject out of the total sample tested made 
more than two errors. In this same study, 
one of the auditory as well as the visual 
single-modality measures consisted of orig- 
inal tests, created exclusively for the pur- 
pose of the study. The use of nonstandard- 
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ized, original single-modality tests for which 
there is neither validity nor reliability data 
is questionable. Finally, even when inves- 
tigators have employed a standardized mea- 
sure of single-modality auditory function- 
ing, this measure has typically been the 
WISC digit span—a subtest of general in- 
telligence test whose validity is derived from 
its contribution to a composite IQ score, not 
from empirical evidence that it measures 
auditory memory. 

The present study investigated the rela- 
tionship of auditory-visual integration and 
single-modality auditory and visual fune- 
tioning using the Visual Sequential Memory 
and the Auditory Sequential Memory sub- 
tests of the Illinois Test of Psycholinguistic 
Abilities (ITPA). The ITPA was selected 
because of its previous use with subjects 
demographically similar to those in the 
present study (Stephenson & Gay, 1972; 
Weaver & Weaver, 1967) and because of its 
widespread use as a differential diagnostic 
instrument specifically designed to measure 
single-modality strengths and weaknesses. 

In summary, the present study examined 
the applicability of previous research find- 
ings concerning the relationship between 
auditory-visual integration and reading per- 
formance in à sample of first- and second- 
grade lower-socioeconomic-status children. 
In this investigation, the effects of grade 
level, sex, IQ, and single-modality function- 
ing were investigated as potential variables 
moderating the relationship between audi- 
tory-visual integration and reading skill. 


MrrHoD 


Subjects 


Approximately one half of the first- and second- 
grade student population was randomly selected 
from an elementary school in a low-socioeconomic- 


status community near a large midwestern city. g 


This procedure resulted in a sample of 86 students, 
52% of whom were in Grade 1 and 48% in Grade 2. 
Of the first-grade subjects, 19 were females and 26 
were males; at the second-grade level, 22 were fe- 
males and 19 were males. 

Of the total sample, 32% were ADC. welfare 
recipients, and an additional 34% were from fami- 
lies who fell below the poverty level established by 
the federal government. 


—  — eem 
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=E Test Instruments 


Auditory-visual integration performance was as- 
sessed using the 10-item instrument and testing pro- 
cedures developed by Bireh and Belmont (1965). 
The examiner presented an auditory cue to the sub- 
ject by tapping on a desk with a pencil in a pre- 
determined sequence and then exposed three sepa- 
rate dot patterns. The subjects selected which of 
the three dot patterns matched the auditory cue. To 
avoid the possibility of the subjects receiving visual 
cues (Sterritt & Rudnick, 1966), the examiner sat 
to the side of and slightly behind each subject to 
present auditory cues. Except for this change, the 
methods of test administration and the directions 
that were given followed those outlined by Birch 
and Belmont (1964, 1965). The method included a 
demonstration of the auditory and visual com- 
ponents of the auditory-visual integration task, the 
presentation of three practice items, followed by 
the presentation of the 10 test items. 

Auditory memory and visual memory were mea- 
sured by the Auditory Sequential Memory subtests 
of the revised Illinois Test of Psycholinguistic 
Abilities (Kirk, McCarthy, & Kirk, 1968). 

All subjects received the appropriate form of the 
Gates-MacGinitie Reading Test: Primary A for 
Grade 1 and Primary B for Grade 2. These tests 
yielded vocabulary and comprehension scores 
which were transformed into standard scores for 
purposes of analysis. 

In addition to the above information, Kuhlmann- 
Anderson IQ scores were obtained for each subject. 
In the school district from which the sample was 
selected, this test is routinely administered in the 
spring semester during the first-grade year. The 
results were available from the school records. 


Procedure 

The auditory-visual integration test was admin- 
istered to all subjects in individual testing sessions 
by the first author using the methods described 
above. Auditory memory and visual tests were ad- 
ministered to all subjects in individual testing ses- 
sions by the second author, using the standardized 
directions for administration specified in the ITPA 
manual. All testing was done during the morning 
hours of the school day. 

The Gates-MacGinitie Reading Tests were ad- 
ministered by the subjects' classroom teachers who 
used the procedures outlined in the test manuals. 


Resutts AND DISCUSSION 


Since the auditory-visual integration in- 
strument and procedures were patterned af- 
ter those developed by Birch and Belmont 
(1965), Table 1 presents means, standard 
deviations, and ranges of auditory-visual 
integration scores for the Birch and Belmont 
primarily middle-class sample, in compari- 
son with similar information for the subjects 
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TABLE 1 
Comparison OF AupITORY-VISUAL INTEGRATION 
PERFORMANCE IN THE PRESENT STUDY 
WITH PERFORMANCES REPORTED BY 
BincH AND BELMONT (1965) 


Study Grade n x SD Range 
Birch and 
Belmont 1 30 5.6 2.2 1-9 
Present 1 45 3.6 1.3 1-6 
Birch and 
Belmont 2 30 7.9 1.6 5-10 
Present 2 41 4.4 1.9 1-8 


in the present study. The auditory-visual 
integration task appears to be more difficult 
for the subjects in the present study. The 
4.4 mean for the second-grade subjects is 
comparable to the 4.1 mean for the kinder- 
garten subjects in the Birch and Belmont 
study. In addition, 70% of Birch and Bel- 
mont’s kindergarten subjects got 4 or fewer 
items correct on the auditory—visual integra- 
tion test, while at first grade, only 30% 
were functioning at this level. In the present 
study, 76% of the first-grade subjects got 4 
or fewer correct; the comparable percentage 
at the second-grade level was 51%. Finally, 
Birch and Belmont found that second-grade 
subjects averaged 80% correct on the 10- 
item test. The average percentage correct 
in the present study was approximately 45%. 

On the basis of their results, Birch and 
Belmont concluded that the period of most 
rapid development of integration skill was 
between kindergarten and Grade 2. The 
results of the present study, however, sug- 
gest that integration skill has not developed 
to a level comparable to that of middle- 
class children by the second grade. This re- 
sult suggests that the conclusions of Birch 
and Belmont should be modified to take ac- 
count of the socioeconomic-status level of 
the subjects. 

Further analysis of the data in the present 
study involved an examination of the rela- 
tionship of auditory-visual integration to 
sex and grade level. Table 2 present means 
and standard deviations of the total number 
correct on the auditory-visual integration 
task for each sex at each grade level tested. 
These data were subjected to a 2 (Sex) x 2 
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TABLE 2 

MEANS AND STANDARD DEVIATIONS OF 

AUDITORY-VISUAL INTEGRATION 

Scores By GRADE AND SEX 

N x SD 

Grade 

Boys | Girls | Boys | Girls | Boys | Girls 
Lie nee: 19 | 3.5 | 3.9 | 1.0 | 1.4 
2 19 22 | 4.2 | 4.6 | 2.0 | 1.9 


(Grade Level) unweighted means two-way 
analysis of variance. While neither the inter- 
action effect nor the sex main effect were 
statistically significant, there was a statisti- 
cally significant grade level main effect 
(F = 437, df = 1/82, p < .05). Thus, al- 
though there are no significant differences 
between males and females in mean audi- 
tory-visual integration scores, auditory- 
visual integration scores of second graders 
were significantly higher than those achieved 
by first, graders. 

As a further examination of the age re- 
latedness of the auditory-visual integration 
task, Pearson product moment correlations 
were computed between auditory-visual in- 
tegration scores and chronological age. This 
analysis resulted in a statistically signifi- 
cant correlation coefficient of .304 (df = 84, 
p < .01). This result, along with the sta- 
tistically significant grade-level main effect 
reported above, suggests that in a predomi- 
nantly lower-socioeconomic-status sample, 
auditory-visual integration is a develop- 
mental, linearly age-related skill. The low 
scores on the auditory-visual integration 
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task, however, suggest that the most rapid 
period of development occurs at some point 
beyond the second grade. 

The effects of auditory and visual short- 
term memory on auditory-visual integra- 
tion were investigated by computing Pear- 
son product-moment correlations between 
auditory—visual integration scores and stan- 
dard scores obtained on the Auditory Se- 
quential Memory and Visual Sequential 
Memory subtests of the ITPA. In light of 
the interest in sex as a moderating variable, 
correlations were computed separately for 
each sex as well as far the total sample. 
Table 3 presents these results. 

None of the total sample (combined 
sexes) correlations were statistically signifi- 
cant. In the separate analysis by sex, the 
visual memory subtest was significant at the 
.05 level for the second-grade boys only. 

The results for auditory short-term mem- 
ory substantiate previous research (Birch & 
Belmont, 1965; Ford, 1967; Kahn & Birch, 
1968). It appears that short-term auditory 
memory, as measured by either the WISC 
digit span subtest or the Auditory Sequential 
Memory subtest of the ITPA, is unrelated 
to auditory-visual integration skill. 

Like auditory sequential memory, visual 
Sequential memory was generally not sig- 
nificantly related to auditory-visual inte- 
gration performance. While the one signifi- 
cant relationship between visual sequential 
memory and auditory-visual integration in 
the second-grade boys might have arisen 
from chance factors, the fact that visual 
sequential memory accounted for almost 


TABLE 3 


Prarson Propuct-Moment CORRELATION COEFFICIENTS BETWEEN AUDITORY-VISUAL 
INTEGRATION PERFORMANCE AND ÁUDITORY AND VisuaL Memory 


Boys 


MT Girls Boys & Girls 
Auditory Visual Auditory Visual Auditory Visual 
1 094" —.113 — .082¢ .000 B 003 
2 398 549" — -2594 an een .083 
1&2 1223 .264 —.119 .098 .022 .188 
an = 26. 
bn = 19. 
en = 19. 
dn = 22. 
“p< 05. 
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* one third of the variance of auditory-visual 
integration in second-grade boys suggests 
| that single-modality visual functioning may 
| be a significant correlate of auditory-visual 
| integration. Additional research is needed to 

| clarify this point. 
~ To investigate the relationship between 
| auditory—visual integration and reading per- 
| formance, Pearson product-moment corre- 
lations were computed between auditory— 
visual integration scores and standard scores 
obtained on the Vocabulary and Compre- 
hension substests of the Gates-MacGinitie 
Reading Tests. Correlations were computed 
jms, separately for sex as well as for total sample 
scores, These results are presented in Ta- 

|  ble4. 

There was no significant relationship be- 
| tween auditory-visual integration and com- 
| prehension. On the other hand, the relation- 
| Ship between auditory-visual integration 
| and the vocabulary measure was significant. 
| (p < .05) for the combined sexes sample. 
~ However, further examination reveals a sig- 
| nificant correlation for males but a nonsig- 

nificant correlation for females. One might 

hypothesize that the significant correlation 
| found in the total sample may be largely 

attributable to the strength of the auditory— 
| visual integration vocabulary correlation 
| found in the boys. 
| A final issue examined in the present 
study pertained to the influence of general 
aoe 


TABLE 4 
Pearson Propuct-Moment CORRELATION 
COEFFICIENT BETWEEN AUDITORY-VISUAL 
INTEGRATION PERFORMANCE AND READING 
VOCABULARY AND COMPREHENSION 


Boys Girls Boys & Girls 
Seat c Com- Com 
xt e um | er 
—— 

1 .874* | .086 | .110°| .078 | .262 .099 

2 -460>*) .349 | .2814| .061 |'.016 | —.111 

1&2] .419**) .192 | .142 | .043 | .225*| -056 
an = 26. 
DA 1) 
en = 19. 
d ni Pd, 
*p < 05. 
** pL 
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intelligence on the relationship between 
auditory—visual integration and reading per- 
formance. Kuhlmann-Anderson IQ scores 
were available for 81 subjects or 94% of the 
present sample. These IQs ranged from 80 
to 143, with a mean IQ score of 107.3 and a 
standard deviation of 12.02. 

Pearson  product-moment correlations 
were computed between àuditory-visual in- 
tegration and  Kuhlmann-Anderson IQ 
scores. Correlations were computed sepa- 
rately for each sex at each grade level as 
well as across sexes and grade levels. None 
of these coefficients was significant. Birch 
and Belmont (1965) reported a significant, 
positive relationship between auditory-vis- 
ual integration and IQ in Grades 1 and 2 
(p < .01 and p « .05, respectively); the 
present study failed to replicate their find- 
ing. 

The diserepancy between the two studies 
may be attributable to methodological arti- 
facts—for example, the two studies em- 
ployed different measures of IQ, and the 
auditory-visual integration task employed 
in the Birch & Belmont study failed to pro- 
vide an adequate basal for the subjects in 
the present study and thereby restricted the 
range of the auditory-visual integration 
measure. On the other hand, it is also possi- 
ble that while auditory-visual integration 
and IQ are signifieantly correlated in mid- 
dle-socioeconomic-status subjects, the two 
measures are not significantly related in a 
predominantly lower-socioeconomic-status 
population. 

In the present study, the correlations be- 
tween auditory—visual integration and read- 
ing vocabulary were significant for the total 
sample and for the total sample of boys. 
First-order partial correlations were com- 
puted to determine the stability of these 
correlations when the effects of IQ were 
controlled. After partialing, the relationship 
between auditory—visual integration and vo- 
cabulary remained significant (r = 49, p < 
-01 for the boys; r = .30, p < .05 for the 
total sample). In fact, the negative direc- 
tion of the IQ/auditory—visual integration 
correlation (r — —.153 for total boys; r = 
—.079 for the total sample) caused the final 
correlation to be higher than it was origi- 
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nally. These results corroborate conclusions 
of Birch and Belmont (1964, 1965), Sterritt 
and Rudnick (1966), Muehl and Kremenak 
(1966), Beery (1967), and Kahn and Birch 
(1968) ; namely, that independently of the 
influence of IQ, auditory-visual integration 
is a significant correlate of reading perform- 
ance. 


GENERAL DISCUSSION 


The results of the present study suggested 
that the findings of previous research should 
be viewed as tentative and restricted to the 
specific type of subjects under study, that 
is, typically middle-class boys. While the 
present study itself must be replicated with 
additional samples from a lower-socioeco- 
nomic-status population and must be ex- 
panded to include a larger range of grade 
levels and samples from several socioeco- 
nomic-status backgrounds, there is at least 
preliminary evidence that both sex and 
socioeconomic background might be impor- 
tant factors to consider in interpreting not 
only auditory-visual integration functioning 
but also the relationship between auditory— 
visual integration skill and reading perform- 
ance. 

The significant relationship of auditory- 
visual integration and reading vocabulary 
in the present sample of boys but not in the 
sample of girls substantiated Reilly's (1971) 
suggestion that sex differences should be 
taken into account in interpreting test re- 
sults. The potential importance of sex dif- 
ferences is particularly pronounced in the 
area of auditory-visual integration research 
since many of the studies conducted (see 
Birch & Belmont, 1964; Ford, 1967; Kahn 
& Birch, 1968; Sterritt & Rudnick, 1966; 
Rudnick, Sterritt, & Flax, 1967) have in- 
cluded only boys in their samples. The pres- 
ent findings, coupled with those of Reilly 
(1971), suggest that the results of these ear- 
lier studies may be limited only to the sub- 
jects employed, namely, middle-class boys. 

Unlike previous studies, the present study 
investigated the relationship between audi- 
tory-visual integration and single-modality 
functioning by employing standardized in- 
struments specifically designed to assess 
auditory and visual memory abilities. The 
present study supported earlier research 
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(Birch & Belmont, 1946; Ford, 1967; 
Kahn & Birch, 1968) and found no signifi- 
eant relationship between auditory-visual 
integration and auditory sequential mem- 
ory. By contrast, the visual sequential mem- 
ory task was significantly related to audi- 
tory-visual integration performance in the 
present sample of second-grade boys (p < 
.05). Since the significant relationship ob- 
tained in the present study was for second- 
grade boys only, the result must be viewed 
as tentative and must be investigated fur- 
ther. 

As the literature was being examined in 
preparation for the present research and as 
the study was being carried out, it was ap- 
parent that an obvious need exists for 
greater standardization and specificity im 
auditory-visual integration research. Dif- 
ferent measures of reading performance, IQ; 
and single-modality functioning have been 
used almost indiscriminately. Of greater 
import, however, may be the types of audi- 
tory-visual integration measures employed: 
These range from the 10-item matching for- 
mat using dot-dash stimuli employed by 
Muehl and Kremenak (1966), to the 10- 
item multiple-choice format developed bi 
Birch and Belmont (1964, 1965) and utiliz 
in the present study, to the 20-item multiple 
choice formats used by Kahn and Bire 
(1968) and Reilly (1971). A major diffi- 
eulty, and possibly a cause of much of the 
conflicting evidence, may be the lack of & 
standardized auditory-visual integration: 
test instrument that meets basic requires 
ments of reliability and validity. In the esti 
mation of the present researchers, the devel 
opment of such an instrument would be 
fruitful next step. 
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IMPACT OF COEDUCATION ON “FEAR OF SUCCESS” IMAGERY 


EXPRESSED BY MALE AND FEMALE HIGH 
SCHOOL STUDENTS’ 


RONALD WINCHEL, DIANE FENNER, anp PHILLIP SHAVER? 
Columbia University 


Horner has stated that the motive to avoid success (“fear of success") 
is increased in females who are forced to compete with males, and she 
has suggested that coeducation is detrimental to females’ academic 
performance. In the present study, 240 male and female high school 
seniors were tested for fear of success, some while attending coed and 
some while attending noncoed high schools. The kind of elementary 
school the subjects had attended, coed or noncoed, was also recorded. 
Results indicated that (a) both male and female subjects expressed 
more negative themes when writing stories about a successful female 
than when writing about a similarly successful male; (b) the coed 
versus noncoed distinction was a potent predictor of fear of success re- 
sponse in female subjects, especially at the elementary school level; 
and (c) when negative consequences of success were mentioned, they 
were usually social or affiliative if the successful figure in the story was 
female and nonaffiliative if the figure was male. Theoretical and prac- 


tical implications were briefly discussed. 


Recent studies reported by Horner (1970, 
1972) have greatly increased our under- 
standing of the motivational differences be- 
tween American males and females in 
achievement situations. It had long been 
known that a large body of results and in- 
ferences concerning achievement motivation 
in males did not apply for some reason to 
females (e.g, Atkinson, 1958; French & 
Lesser, 1964; Lesser, Krawitz, & Packard, 
1963; Veroff, Wileox & Atkinson, 1953), 
but no adequate explanation had been 
formulated. Horner hypothesized that fe- 
males are hampered by a form of anxiety, 
which she called “fear of success,” that is 
uncommon in males. In order to measure 
fear of success, Horner asked female college 
students to tell a story based on the follow- 
ing projective cue: “After first-term finals, 


*The authors gratefully acknowledge the co- 
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Brooklyn, and the Joel Braverman Yeshiva High 
School, Brooklyn. 

* Requests for reprints should be sent to Phillip 
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versity, New York, New York 10027. 
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Ann finds herself at the top of her medical 
school class.” A majority of her subjects, 
over 65%, portrayed Ann as anxious of 
guilty or predicted that Ann’s success would 
have unpleasant consequences, such as loss 
of femininity or social rejection. In a com 
parable group of male-subjects, responding 
to a cue describing “John” rather than 
“Ann” as the successful medical studen 


was also found that females who expresse 
fear of success on the projective measure 
performed worse in a competitive than in a 
noncompetitive situation, whereas most of 
the males and the minority of females who 
did not show fear of success performed 
better under competitive conditions. ] 

Because Horner gave the John cue only 
to males and the Ann cue only to females; 
it was impossible to determine whethi 
negative responses were due to sex of sub: 
ject (suggesting fear and conflict, peculial 
to females) or to sex of the actor in the cu 
(suggesting that males and females hok 
common sex role stereotypes). Horner! 
studies also left open many development 
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" and educational policy questions, as she 
herself pointed out (Horner, 1968, Chapter 
4). A study by Monahan, Kuhn, and 
Shaver (1974) has addressed some of these 
questions. One hundred and twenty 10- to 
16-year-old boys and girls responded to the 
John or Ann cue in a completely crossed 
design. Both sexes gave more negative re- 
sponses to the Ann cue, indicating that sex 
role stereotypes are largely responsible for 
the “fear of success” imagery. There was 
also some evidence, however, that moti- 
vational differences were being tapped: 
Females often expressed anxiety and con- 
flict in describing Ann’s fate while the 
males often expressed hostility. The overall 
frequency of negative responses declined 
with age, contrary to a trend described by 
Horner (1972), and more “fear of success” 
was expressed in response to the John cue 
than Horner’s study would lead one to ex- 
pect. (This finding has been obtained in 
other recent investigations as well; see the 
review by Tresemer, 1973.) 

Horner’s experimental results (1968, 
1970) and subjects’ answers to questions 
concerning competition convinced her that 
the “motive to avoid success” is aroused 
most strongly in competitive situations. 
Considering the prevailing sex roles and the 
emphasis on education as a means to “get 
ahead” in a competitive society, she specu- 
lated that the developmental trend toward 
greater fear of success in older college fe- 
males was due to the increasingly signifi- 
cant competition for valued positions in 
academic and professional life. Consistent 
with this is her finding (Horner, 1972) that 
academically superior female students— 
those most capable of competing with males 
for professional positions—are more likely 
than their less able classmates to show fear 
of success. In several places, Horner has 
suggested that such findings may argue 
against the prevailing emphasis on coeduca- 
tion, since coeducation increases the salience 
of cross-sex competition for academic and 
professional success. This possibility is 
especially important considering a recently 
released report from the Carnegie Commi- 
sion on Higher Education (1973), Oppor- 
tunities for Women in Higher Education. 
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The report indicates that a high proportion 
of successful women come from noncoed 
colleges. Although this could be due partly 
to selective enrollment, 


there are reasons for believing that the experience 
of attending a women’s college is partially respon- 
sible. In women’s colleges, female students are not 
reluctant to participate actively in class discussion 
for fear of losing their feminine appeal in the eyes 
of male students. They have far greater oppor- 
tunity to gain experience in leadership roles in 
campus organizations and activities than women in 
coeducational institutions, where the top leader- 
ship positions nearly always go to men [p. 73]. 


The report also notes that students in 
women’s colleges are more likely to enter 
traditionally male fields, such as science, 
and are more likely to participate in 
athletics. 

The present study was designed to ex- 
plore the development of fear of success a 
bit further, with particular emphasis on how 
such development is affected by coeduca- 
tion. Male and female high school students 
from the same geographical area and from 
similar socioeconomic and cultural back- 
grounds were given either the Ann or the 
John cue as a measure of fear of success. 
Some of them were attending a coed high 
school and some were not. Moreover, part of 
each group had attended noncoed elemen- 
tary schools.* 


METHOD 


Subjects 


Two hundred and fifty-two high school seniors 
attending Jewish private schools in a middle- to 
upper-middle-class Brooklyn community com- 
pleted the fear of success measure in the spring of 
1973. After the measure had been taken, each sub- 
ject was asked what he or she thought the study 
was about (Horner’s work had been discussed in 
national periodicals that were potentially available 
to the subjects). Twelve people were eliminated 
before analyses were performed because they were 
aware of the purpose of the measure. The number 
of males and females in each school and cue 
category is shown in Table 1. 


Procedure 
Subjects completed the fear of success mea- 
sure in their classrooms after being assured that 


*In this paper, “elementary school" refers to 
Grades 1 through 8. 
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the results would be seen only by researchers at 
Columbia University and not by the subjects’ 
teachers. Some received booklets containing the 
John cue, others received the Ann cue. An attempt 
was made to equate the number of males and fe- 
males in each condition, but this was not always 
successful, All instructions were given in written 
form. These asked the subject to “write a story” 
based on the following sentence: After first term 
finals, Ann (John) finds herself (himself) at the 
top of her (his) medical school class. “You should 
tell (a) what has happended in the past that led 
up to this; (b) what Ann (John) thinks about 
this; (c) what will happen to Ann (John) in the 
future.” Each subject was given as much time as 
necessary. Áfter completing a story, each subject 
answered the question mentioned previously about 
the probable purpose of the study. All protocols 
were adequate for scoring and, with the exception 
of those from the 12 nonnaive subjects, all were 
included in the analyses (N — 240). 


Scoring 


The initial scoring procedure was identical to 
Horner's (1970, p. 59). If a protocol expressed an 
overall positive attitude toward the achievement 
and toward the actor, it was scored as positive. 
If it indicated any negative attitudes toward the 
achievement or the actor or specified negative 
consequences of the achievement, it was scored as 
negative. Two raters, one male and one female, 
scored the protocols independently, agreeing in 
96% of the cases. They also divided the negative 
responses into two categories (agreeing in 9396 
of the cases): affiliative concerns (eg. marriage, 
loss of friends, loss of feminine attractiveness) and 
nonaffiliative concerns (eg, monetary considera- 
tions, dissatisfaction with work load, claims that 
the success was due to cheating). All disagree- 
ments were easily resolved by discussion. 


Resuits 


The percentages of subjects whose pro- 
tocols were assigned to the “fear of success" 


TABLE 1 
Proportion or Supsects Giving FEAR oF 
Success Responses IN EACH 
Sex-Scuoor-Cun CONDITION 


Ann cue John cue 

School/sex 
"E^ s Per| x 

Coed high school 
Females 9/22 | 40.9 | 4/17 | 23.5 
Males 12/39 | 30.8 | 2/23 | 8.7 

Noncoed high school 

Females 6/38 | 15.8 | 2/34 | 5.9 
Males 9/33 | 27.3 | 4/34 | 11.8 
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TABLE 2 


Proportion or FEMALE SuBiEcTS WHo Suowep 
FEAR or Success IN RESPONSE TO THE 
ANN CUE CLASSIFIED BY TYPES OF 
SCHOOLS ATTENDED 


High school 


Elementary school Coed Noncoed 


Proportion} % Proportion % 


Noncoed 0/9 0 1/17 


Coed 9/13 69.2 5/21 | 23.8 


category are given in Table 1 for each con- 
dition. The results replicate Horner's basic 
finding: For each sex of subject and type of 
school eategory, the proportion of subjects 
showing fear of success is higher for the Ann 
cue than it is for the John cue (overall x? = 
8.71, df = 1, p < .005). Of special interest 
is the significant difference between the 
proportion of fear of success stories evoked 
by the Ann cue in females from coed 
schools and the corresponding proportion 
for females from noncoed schools (40.9% 
for the coed school, 15.8% for the noncoed 
school: x? = 443, df = 1, p < .05; the 
corresponding percentages for males, 30.8% 
and 27.3%, are not significantly different). 
Further analyses (summarized in Table 
2) revealed that in all but one case in which 
females responded negatively to the Ann 
cue, the subjects had attended a coed ele- 
mentary school (x? = 10.25, df = 1, p < 
.005).4 Of the nine girls who attended a 
noncoed elementary school and a coed high 
school, none showed fear of success. Turning 
to the 34 girls who attended coed elemen- 
tary schools, we find that the proportion 
showing fear of success was affected sig- 
nificantly by the kind of high school they 
attended (69.2% for the coed high school, 
23.8% for the noncoed high school; x? 
5.09, df = 1, p < .025). It might also be 


ra 


*One of the authors visited a kindergarten 


classroom in one of the coed elementary schools 
and found, in addition to distinct play areas for 
boys (blocks, planes, etc.) and girls (a model 
kitchen), two posters proclaiming (in pink) “what 
little girls are made of” (sugar and spice...) and 
(in blue) “what little boys are made of” (snails 
and puppy dog tails). 
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"^! noted that the figure of 69.2% is quite simi- 
lar to the 65% obtained by Horner, most of 
whose subjects—students at the University 
of Michigan—had probably always attended 
coed schools. 

Finally, Table 3 shows the frequency with 
which negative responses to both Ann and 
John cues fell into affiliative or nonaffilia- 
tive categories as a function of sex of sub- 
ject and type of school. It is evident that 
negative responses to the Ann cue from both 
male and female subjects fell primarily into 
the affiliative category (69.4%), while the 
reverse pattern obtained for the John cue 
(only 16.7% in the affiliative category). A 
2x2 (Ann-John/affiliative-nonaffiliative) 
chi-square test indicates that the overall 
pattern is highly reliable (Qe = 8.15, df = 
1,p < .005). 


Discussion 


The results are compatible with Horner’s 
notion that increased cross-sex competition 
increases fear of success in females. It ap- 
pears, however, that the motive is largely 
formed before a girl reaches high school and 
that females who have not learned to avoid 
success in the presence of male classmates 
in elementary school do not “fear success.” 
For those who do attend coed elementary 
schools, fear of success is more common, 
and it appears to be increased by attendance 
at a coed high school. 

Obviously we must be cautious in gen- 
eralizing from these results. The subject 
population may not be representative of all 
high school students. The measure of success 
avoidance was quite primitive; only after 
the study was completed did a more precise 
Measuring system involving different cues 
and different scoring procedures become 
available (Horner, Tresemer, Berens, & 
Watson, 1973). Also, we cannot be com- 


we pletely certain that the results are due 


solely to school rather than home environ- 
ment. We tried to rule out the effect of home 
by chosing schools that were as similar as 
possible while differing on the coed-noncoed 
dimension. The two high schools are only 
four blocks apart; the male and female sec- 
tions of the noncoed school are located in 
. adjacent buildings (although there is no 


Z^ contact between males and females during 
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TABLE 3 
NuMBER or Supsects WITHIN Eacu SEx-ScHoOL- 
Cue Conpition WHo Gave AFFILIATIVE 
or NoNaFFILIATIVE FEAR OF 
Success RESPONSES 


Ann cue John cue 


School/sex 
Afilia- |Nonafüli-| Afilia- |Nonaffili- 
tive ative tive ative 


Coed high school 


Females 6 3 1 3 
Males 7 5 0 2 
Noncoed high 
school 
Females 6 0 1 1 
Males 6 3 0 4 


the school day); the students at the two 
schools come from similar religious back- 
grounds, neighborhoods, social classes, and, 
in several cases, from the same families. In 
discussions with the principals of the two 
schools, we were unable to determine any 
systematic differences between families who 
chose to send their children to the different 
schools. 

Our results have both theoretical and 
practical implications. They indicate that 
developmental studies of fear of success 
should include preadolescent age groups and 
should treat school environment as an im- 
portant variable. On the practical side, the 
results seem to argue for noncoed schools. 
Since there may be other arguments that 
favor coeducation, and considering that 
most schools in the country would not be 
likely to change their commitment to co- 
education even if the present results proved 
reliable, it would be more realistic to try to 
design ways to reduce the motivational 
handicap developed by females under cur- 
rent arrangements. Some of the relevant, 
issues, for example, the elimination of 
stereotypes commonly presented in text- 
books and in physical education courses, 
have already been addressed by concerned 
students and their parents. These are un- 
doubtedly only a beginning. 
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When children copy geometric shapes, 
they display marked sequential patterns. 
They start, for example, at the top more 
often than the bottom, at the left more often 
than the right, and with a vertical rather 
than a horizontal stroke (Gesell & Ames, 
1946; Goodnow & Levine, 1973; Hanfmann, 
1933). Such patterns appear before children 
learn to write and may help account for the 
occurrence of particular errors in copying 
or writing. Shapes like *d," for instance, 
are reversed more often than shapes like 
"b" (Goodnow & Levine, 1973; Lewis & 
Lewis, 1965), an asymmetry of error that 
appears to stem from children's preferences 
for starting with a vertical stroke and pro- 
gressing toward the right. 
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RELATIONSHIPS AMONG PERCEPTUAL-MOTOR TASKS: 
TRACING AND COPYING* 
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Subjects of several ages were asked to copy, trace, or point to the 
beginning of a set of shapes. The aim was to determine the extent to 
which sequential behavior (starting points and stroke progressions) 
was consistent across tasks. With one notable exception, behavior was 
consistent. The exception was the extent to which children started 
at the left rather than the right, a behavior that was far less frequent 
when tracing or pointing to the beginning of a shape than when copy- 
ing. The results point to the feasibility of using sequential analyses to 
explore questions of generality and transfer on graphic tasks. 


The sources of such directional behavior 
are not clear. They might lie, for instance, 
in some factor specific to the copying task, 
to handedness, or to the nature of the Eng- 
lish seript. The present study examines the 
effect of the task, asking whether sequential 
behavior is the same on three tasks: copy- 
ing, tracing, and pointing to the beginning 
of a shape. 

As a variable, the selection of task effects 
is based on indications that neither handed- 
ness nor the nature of the English seript 
offers a sufficient explanation for the direc- 
tional behavior. Left-handed and right- 
handed children appear to show similar pat- 
terns, the main difference being the extent 
to which figures are started at the left 
(Gesell & Ames, 1946). Children in the 
United States and Israel also display 
broadly similar patterns, the most promi- 
nent differences being in the direction for 
drawing circles and the age at which most 
children cease copying shapes with a single, 
continuous stroke (Goodnow, Friedman, 
Bernbaum, & Lehman, 1973). 

Among tasks, the use of pointing allows a 
check on the effect of using a graphic tool. 
The selection of tracing is based on the fre- 
quent use of tracing as à technique for im- 
proving performance in copying or writing, 
a usage dating back to the early Greeks and 
Romans (Fernald, 1943, pp. 27-29). Evi- 
dence for the effectiveness of tracing, how- 
ever, is difficult to find. Most reports of 
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positive effect (e.g. Montessori, 1966) are 
not experimentally based. Within the small 
set of experimental studies, Brittain (1969), 
Goodson (1967) and Rand (1971) report 
no benefit, while Bee and Walker (1968) re- 
port negative transfer. This lack of fit be- 
tween classroom practice and experimental 
results suggests at least that the relation- 
ship between the two tasks needs additional 
exploration. It might well be, for example, 
that children do not spontaneously use the 
same directional patterns when tracing as 
when copying. In such a case, the lack of 
transfer might stem from different patterns 
being practiced on the two tasks. 

A pilot study supported the possibility of 
a difference between performance when 
tracing and when copying. One group of 
children aged 5 to 7 years (n = 60) was 
asked to copy two shapes (a triangle and a 
diamond). A group in the same age range, 
from a different school (n = 60), was asked 
to trace the same shapes, going over the 
dittoed outlines with a darker colour. The 
two groups did not differ in the extent to 
which they started at the top or in the ex- 
tent to which they used a single continuous 
line. They did differ, markedly, however, in 
the extent to which the first stroke was 
down the left rather than the right side of 
each shape. On the triangle, for instance, 
84% of the children first drew down the 
left side when copying. Only 40% did so 
when tracing, a difference significant, on 
chi-square analysis, beyond the .01 level. A 
similar difference occurred with the diamond 
shape. The use of two different schools, how- 
ever, left questions about the comparability 
of the subjects. In addition, treating 5- to 
7-year-olds as a single group (a treatment 
based on one school being ungraded) left 
questions about effects from changes in age 
or experience. The present study was de- 
signed to cover these questions. 


METHOD 


Subjects 


For Condition 1 (copying versus tracing), 76 
subjects were drawn from four age groups. One 
group (n = 20) was undergraduate; the other three 
were children attending a summer day camp in 
Maryland after completing either kindergarten 
(n = 20), Grade 1 (n = 16), or Grade 2 (n = 20). 
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Almost all of the children attended, during the 
school year, the private school operating the day 
camp. Median chronological age (CA) in the four 
groups was 20, 6:2, 7:1, and 7:11 years, respec- 
tively. 

For Condition 2 (copying versus pointing to the 
beginning of a shape), 52 subjects were drawn from 
three grades in a public school within Fairfax 
County, Virginia. Children were in mid-year kin- 
dergarten (n = 19), Grade 2 (n = 18), or Grade 
4 (n = 16). Median CA was 5:10, 7:8, and 9:9 
years, respectively. 

For both conditions, all subjects spontaneously 
used the right hand. 


Stimulus Materials and Procedure 


Three shapes provided a comparison of tasks: 
triangle, inverted V, and U-shape. The subjects 
first copied the three shapes, embedded in a set of 
15. Following a distractor task, subjects either 
traced over the test shapes or pointed to where 
they thought the shapes began. 


Scoring 


Productions were first scored for correctness. 
The liberal criteria are given in Goodnow and Le- 
vine (1973). Correct copies were then scored for 
the observance af three sequential principles. One 
of these dealt with the overall sequence of strokes, 
asking whether the sequence showed a thread or & 
broken line. In a threaded pattern, the subjects 
lay out a shape as if they were laying out a thread. 
They may momentarily lift their pencils but no 
jump occurs in the path. (Scoring for this principle 
applied only to Condition 1.) The other two prin- 
ciples dealt with starting points, asking whether 
or not subjects started at a topmost point, and 
whether they started at a leftmost point or with a 
stroke to the left. (Scoring for these principles ap- 
plied to both conditions.) 

For each shape, subjects were next scored for 
whether they displayed same or different behavior 
across tasks, for example, started at the left or not 
on both, started at the top or not on both, threaded 
or not on both. In Condition 2, these counts were 
reduced to a single score, namely the use of same - 
versus different starting points on both tasks. 

The first type of scoring (observance of princi- 
ples) provides a picture of what subjects did on 
each task. The second type (presence of same-dif- 
ferent behavior) allows a statistical analysis of 
similarity or consistency across tasks. 


RESULTS 


Between copying and tracing, a large dif- 
ference appeared in the incidence of left 
starting (Figure 1) but not in the incidence 
of other sequential features (Table 1). The 
difference was strongest with the kinder- 
garteners. It diminished with age but was 
still present to some extent among children 
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STARTING BEHAVIOR (*= Starting Point) 
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: Ficure 1, The incidence of left-starting behaviors when copying and tracing (Condi- 
tion 1) or when copying and pointing to the beginning of a shape (Condition 2). 


who had completed Grade 2 (Figure 1). Sta- 
tistically, the three groups of school children 
were not significantly different from one an- 
other in the incidence of same-different be- 
havior across tasks, the only consistent dif- 
ferences being between kindergarteners and 
adults. For these two groups and for left- 
starting behavior, Fisher's exact probability 
tests yielded values of p = 0.006 for the tri- 
angle, p = 0.0042 for the inverted V, and 
p = 0.0016 for the U-shape. 

Between copying and pointing, the in- 
cidence of starting at the top of the triangle 


showed variation among the two older 
groups (Table 1). The most marked differ- 
ence across tasks, however, again centered 
around the incidence of left-starting behav- 
ior (Figure 1). On the triangle and on the 
inverted V, a smaller proportion of subjects 
chose the left base-point when pointing as 
compared with copying. The effect was 
strongest in kindergarten: No kindergar- 
teners, for example, pointed to the left base- 
point as the beginning of the triangle, 53% 
choosing the top and 47% choosing the right 
base-point. On the inverted V, a weak form 


TABLE I 


PROPORTION OF SUBJECTS IN 


Four Ace Groups EXHIBITING 


Two Aspects or DIRECTIONAL BEHAVIOR 


Starting at top Threading 
Grade n Shape 1 Shape 2 Shape 3 Shape 1 Shape 2 Shape 3 
Copy | Trace | Copy Trace | Copy | Trace | Copy | Trace Copy | Trace | Copy | Trace 
Kindergarten | 20 | .50 | .70 | .35 | .35 | 1.00 | 1.00 | .55 | -60 | -60 | -65 95 | .75 
Grade 1 16 | 266 | 86 | .44 | .62 | 1.00 | .94| .56 | .18 | .56 | .36 | .94 | .36 
Grade 2 2 | 90 | 55 | .70 | .65 | L00 | 1:00 | .15 | .25 | -30 | .35 | .80 | .35 
Adult 20 | 75 | 155 | 35 | 10 | 1.00 | .95| .20 | .45 | .75 | .80 | .80 | .60 
Copy | Point | Copy | Point Copy | Point 
Kindergarten | 19 | .47 | .53 | .45 | .39 | -74| -89 
Grade 2 18 | 33 | .55 | .39 | .11 | 1.00 | 1.00 
Grade 4 16 | .47 | .67 | .19 | -31 | L.00| .94 


Note, Shape 1 — triangle; Shape 2 — inverted V; Shape 3 = U-shape. 
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of this type of difference was still present 
among children in Grade 2. On the U-shape, 
a smaller proportion of subjects chose any 
left point, top or bottom, when pointing as 
compared with copying. The difference was 
restricted to kindergarten. 

Statistical analyses for the comparison 
between copying and pointing were based on 
the number of subjects in each age group 
who used the same starting point on the two 
tasks. With this measure, a series of chi- 
squares and Fisher's exact probability tests 
yielded differences significant beyond the .05 
level between kindergarteners and second 
graders and between second graders and 
fourth graders on the inverted V and on the 
U-shape. On the triangle, differences in con- 
sistency across tasks were undercut by the 
shift, in the two older groups, toward more 
top starting when pointing than when copy- 
ing. 

In sum, children after kindergarten and 
Grade 1 appear to be still moving toward 
consistency across tasks. 


Replication 


Although sizable, the difference between 
copying and tracing was less than that ob- 
tained in the pilot study. Such variation 
might result from a somewhat unstable phe- 
nomenon or from the fact that the subjects 
in Condition 1 had already completed a year 
in kindergarten and had been taught to 
write. To explore these possibilities, the 
comparison of copying with tracing was re- 
peated with an additional group. The chil- 
dren in this group were living in an area 
where few children attend preschool and 
where county policy actively discourages 
formal training, including training in writ- 
ing, in kindergarten. For the 60 subjects, 
testing was in the middle of the school year. 
Median CA was 5:5 years. The shapes were 
an inverted V and a U-shape. Procedure 
was the same as in Condition 1. 

This comparison established the stability 
of the difference between tracing and copy- 
ing. As in Condition 1, no significant differ- 
ences appeared in the incidence of threading 
or of starting at the top. A significant differ- 
ence again occurred, however, in the inci- 
dence of starting at the left or with a left 
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stroke, with left-starting occurring far less 
often when tracing than when copying. The 
size of this difference was similar to that 
obtained in the pilot study. On the inverted 
V shape, for example, 83% of the subjects 
started with a left stroke when copying, 
whereas only 3596 did so when tracing. The 
difference in performance between tracing 
and copying was similar for the U-shape: 
79% started at the left when copying, while 
only 33% did so when tracing. In general, 
female subjects showed consistent left-start- 
ing behavior when copying and shifted to 
mixed behavior when tracing (some starting 
at the left, some at the right). In contrast, 
male subjects showed somewhat less con- 
sistent left-starting when copying and a 
shift to starting at the right when tracing. 
On the U-shape, for example, 100% of the 
female subjects started at the left when 
copying as opposed to 4796 when tracing. 
Males, on the other hand, started at the left 
56% of the time when copying and only 19% 
of the time when tracing. 


Discussion 


The results suggest that most aspects of 
directional behavior in copying are not spe- 
cific to the copying task. The tendency to 
start at the top rather than at the bottom, 
for example, was common to all three tasks: 
copying, tracing, and pointing to the be- 
ginning of a shape. In addition, copying and 
tracing did not differ in the extent to which 
they elicited the pattern of using a single 
continuous line. The prominent area of dif- 
ference lay in the extent to which children 
displayed starting at the left on all three 
tasks, this pattern being markedly less fre- 
quent when tracing or pointing to the begin- 
ning of a shape than when copying. Such 
variations suggest that the different aspects 
of directionality may have somewhat dif- 
ferent sources. At the least, left-right direc- 
tionality appears to be the least stable aspect 
of directionality or the slowest to achieve 
constant use. 

In addition, the results suggest a feasible 
way of looking at transfer across tasks. One 
condition for positive transfer may be the 
use of the same path on both tasks, that is, 
the insistence by a teacher that the pattern. 
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' to be followed in copying or writing be fol- 
lowed in tracing. Such a condition would fit 
with Montessori's reports of positive trans- 
fer from tracing, since the Montessori pro- 
cedure explicitly calls for using the same 
path on all tasks (e.g., Montessori, 1966, p. 
266). In contrast, negative transfer (Bee & 
Walker, (1968) or no transfer (Brittain, 
1969; Goodson, 1967; Rand, 1971) from 
tracing to copying may be associated with 
allowing children to follow their own paths 
when tracing. Left to their own devices, 
children may not be practicing the same be- 
havior on both tasks, particularly as far as 
starting at the left is concerned. Such con- 
ditions are unlikely to account for all trans- 
fer effects across tasks, but they suggest a 
minimal safeguard for teachers to adopt 
when using the technique of tracing and a 
way of introducing some order into a litera- 
ture marked by discrepancies. 
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ACQUISITION AND NONSPECIFIC TRANSFER EFFECTS 
IN PROSE LEARNING AS A FUNCTION OF 
QUESTION FORM’ 
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Acquisition and nonspecific transfer effects in prose learning were 
studied by presenting a different 2,000-word prose passage on each of 
three successive days, with each day consisting of four reading and 
testing trials. Testing employed either multiple-choice or completion 
questions. Performance failed to improve over passages, indicating 
an absence of nonspecific transfer effects. Performance on the latter 
trials tended to deteriorate as a function of passage in the multiple- 
choice condition. The percentage errors after the first correct response 
were also greater in the multiple-choice than in the completion condi- 
tion. The findings provided no evidence that learning-to-learn effects 
exist with prose materials and suggested that repeated use of the 
multiple-choice test produces interference at the higher levels of 


learning. 


The study of nonspecific transfer includes 
the effect of warm-up and the so-called 
learning-to-learn effect (Postman, 1969). 
Typically, the study of learning-to-learn 
has involved the use of traditional verbal 
learning tasks such as paired-associate 
learning (e.g, Keppel & Postman, 1966), 
serial learning (e.g., Meyer & Miles, 1953) 
and, more recently, free recall (e.g., Post- 
man, Burns, & Hasher, 1970). When prose 
passages have been used, however, data 
pertinent to demonstrating nonspecific 
transfer effects usually have been generated 
only incidentally, with the evidence sug- 
gesting that nonspecific transfer effects, 
when obtained, are of small magnitude 
(e.g., Crouse & Idstein, 1972). 
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The present study was designed to deter= 
mine whether nonspecific transfer effects 
could be demonstrated with successive ae- 
quisition of samples of unrelated prose ma- 
terials. The question of whether one would; 
expect nonspecific transfer effects to occur 
in prose learning may be answered posi- 
tively or negatively. On the one hand, ii 
could be argued that the process of learning 
from prose materials is probably quite simi- 
lar to that found in other learning tasks; 
therefore, one would expect to find nonspe- 
cific transfer effects in the prose context. On 
the other hand, it could be argued thai 
learning from given samples of prose mate- 
rial will depend not upon training in th 
task situation but upon factors such as the 
person's knowledge of the topic and his gen- 
eral level of comprehension skills (Carroll; 
1972). 

In addition to providing information re- 
garding the possible occurrence of nonspe-| 
cific transfer effects in prose learning, thi 
present experiment was also designed 
study the effects of question form upon a 
quisition and to determine whether th 
question form variable differentially infi 
ences nonspecific transfer effects. Multipl 
choice and completion questions were em: 
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ployed, with the identical stem of the ques- 
tion used in both cases. Theoretically, the 
multiple-choice and completion questions 
may be considered as recognition and recall 
procedures, respectively. Assuming that a 
recall procedure involves a retrieval process 
and retrieval in a recognition situation is 
absent or at least reduced compared to the 
recall task (e.g., Underwood, 1972), it was 
expected that completion questions might 
produce the development of retrieval skills 
in the given task situation. This would lead 
to nonspecific transfer effects, while the uti- 
lization of such skills would not be found, 
at least to the same extent, in a multiple- 
choice recognition task. 


METHOD 


Materials 


Three prose passages from the World Book 
Encyclopedia Yearbook (1968) were edited and 
titled. The three passages pertained to the cul- 
tural and geographical development of Formosa, 
the development of a river project in South 
Vietnam, and a discussion of Tikal, a Mayan city. 
The length of each passage was approximately 
2,000 words. The passages were unrelated with re- 
spect to specific content. 


Design and Procedure 


Each subject was presented a passage for eight 
minutes, followed by a seven-minute test trial. 
Each passage was presented for four such study- 
test trials. The questions were presented in four 
different random orders over the four trials. The 
three passages were presented on three consecutive 
days, with the order of presentation of the three 
passages counterbalanced via the use of a ran- 
domly permuted 3 X 3 latin square. Question form 
was a between-subjects variable. The instructions 
stated that the subjects were to learn what they 
could from the prose passage in the time allotted 
and that they would be tested immediately after 
they finished reading. The nature of the test was 
not indicated. 


Questions 


For each of the three passages, 21 questions 
were constructed in either a completion or a multi- 
ple-choice form, with the two forms having identi- 
eal Stems. From preliminary testing involving 24 
Subjects, the 10 best questions were selected, with 
isption based upon the consistency with which 

© question was consistently rated as factual in 


s and upon the moderate difficulty level of 
the question. 
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Subjects 


There was a total of 30 subjects, with 15 sub- 
jects serving in the multiple-choice and 15 in the 
completion condition. The subjects were assigned 
to the two conditions and to the three passage or- 
ders via a table of random numbers. The subjects 
were introductory psychology students. 


RESULTS 


The mean correct responses for the three 
successive passages were 6.70, 7.07, and 
6.83, respectively. Analysis of variance re- 
vealed that these differences are not signifi- 
cant (F < 1.00, df = 2/56), thus indicating 
the absence of significant, nonspecific trans- 
fer effects. Furthermore, the Passage X 
Question Form interaction is not significant 
(F < 1.00, df = 2/56). 

With respect to the question form varia- 
ble, multiple-choice questions yielded sig- 
nificantly more correct responses (M — 
7.44) than completion questions (M — 6.29, 
F = 6.66, df = 1/28, p < .05). In addition, 
the Question Form x Trials interaction was 
significant (F = 4.38, df = 3/84, p < .01), 
as was the Passage X Question Form X 
Trials interaction (F — 3.58, df — 6/168, p 
« 01). 

The data pertinent to these two signifi- 
cant interactions are shown in Figure 1. 
The significant three-factor interaction is 
apparently due to a progressive change over 
the passages of the Trials x Question Form 
interaction. Specifically, acquisition of first- 
passage information increased in virtually a 
parallel manner for the multiple-choice and 
completion questions; however, convergence 
of performance occurred for the two types 
of questions over trials for the second and 
third passages. It is of particular interest 
that on the third and fourth trials of the 
second and third passages, it was found 
that multiple-choice performance did not 
show improvement on the fourth trial. In- 
deed, inspection of the data of the three 
passages indicates that performance on the 
final trial in the multiple-choice condition 
decreased as a function of passage. Thus, 
the data suggest that completion questions 
produced consistent improvement over 
trials for all three passages, although there 
is no learning-to-learn effect; however, mul- 
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Passage 1 


=œ COMPLETION 
=-=- a MULTIPLE-CHOICE 


MEAN CORRECT RESPONES 


1 2734 


Ficure 1. Mean correct responses as a function of trial for the completion and multiple- 
choice question forms for each successively presented passage. 


tiple-choice questions produce consistent 
improvement only on the first passage while 
on the second and third passages, multiple- 
choice performance deteriorates at the 
higher levels of acquisition. 

The Figure 1 data also suggest the possi- 
bility that Trial 1 performance may be a fac- 
tor in the Passage X Trial X Question Form 
interaction, since the data at least suggest 
that significant positive transfer may have 
occurred on Trial 1 as a function of pas- 
sage, especially for the multiple-choice con- 
dition. However, analysis of the Trial 1 
data indicated that although multiple- 
choice Trial 1 performance was signifi- 
cantly superior to that found with comple- 
tion questions (F = 22.18, df — 1/29, p « 
.01)—a result that is not surprising—there 
nevertheless was not a significant effect of 
Passage (F = 1.20, df = 2/56, p > .05) or 
of the Passage x Question Form interac- 
tion (F < 1.00, df = 2/56). 

A more detailed analysis of the comple- 
tion and multiple-choice conditions pro- 
vided the following information. First, as 
one would expect, the completion condition 
produced a number of omission errors, 
namely, .48, .41, and .45 of the total number 
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Passage 3 
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of errors for the three respective passages. 
Second, the percentage errors after the first 
correct responses were tabulated by dete 

mining the number of opportunities for e 
ror after the first correct response for each 
passage for each subject, dividing the num: 
ber of opportunities into the number of er- 
rors, and multiplying by 100. The three re- 
spective passages of the multiple-choice 
condition yielded a percentage error after 
the first correct response of 7.3%, 4.9%, and 
8.5%, respectively ; for the completion condi- 
tion, the respective percentages were A%, 
3.2%, and 3.5%. Analysis revealed a signifi- 
cant difference of question form condition; 
(F = 5.57, df = 1/28, p < .05), thus indi- 
cating that the multiple-choice procedure 
involved some difficulty in maintaining sta 
bility of correct response. Tabulation of the 
errors made after the first correct response 
indicated that over the three passages, er- 
rors were made on 10 different responses in 
the completion condition and 42 differeni 
questions in the multiple-choice condition. 
Moreover, if 2 errors after the first correct; 
response occurred on the same question, the 
probability was .76 that the same e ro 
would be repeated in the multiple-choice 


t 
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condition. There were 17 cases of 2 or more 
errors after the first correct response in the 
multiple-choice condition and only 5 in the 
completion condition. Of these 5, 3 involved 
repetition of the same error. Thus, in the 
multiple-choice condition, there is consider- 
able diffieulty in maintaining correct re- 
sponses, and when errors are made and re- 
peated, they tend to be the same errors. 


Discussion 


The failure to obtain nonspecific transfer 
effects in the present study is probably at- 
tributable to one of two factors. One possi- 
bility is that the present experiment may be 
insensitive to nonspecific transfer effects, 
although it is difficult to see in what way 
the experiment may be viewed as deficient 
in this regard. The most probable reason is 
that at the level of the college student, 
prose comprehension may be “content-de- 
termined” and not “task-determined,” that 
is, for experienced readers, the understand- 
ing of prose may primarily depend upon 
how well the individual comprehends the 
specific content of the material; for less ex- 
perienced readers, however, learning from 
prose may be much more related to the 
level of development of comprehension 
skills. From this reasoning, one might ex- 
pect that nonspecific transfer may occur in 
children who are in the earlier phases of 
reading development. It also may be noted 
that Slamecka (1960) obtained nonspecific 
transfer effects with connected discourse, 
using the verbatim recall procedure for 20- 
word sentences. This result may be due to 
the fact that with such a procedure, Sla- 
mecka’s task and measurement procedures 
were more closely related to the traditional 
serial-learning paradigm than those of the 
present study, with Slamecka’s design plac- 
Ing strong emphasis upon rote memory and 
upon minimizing the need to comprehend 
the contents of the passage. 

The second aspect of the present study 
which warrants comment is the following 
finding: With repeated passages, the multi- 
ple-choice procedure tended to yield less ef- 
fective performance at higher levels of ac- 
quisition with an increase in the number of 
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passages acquired. The locus of this effect 
appears to be in the tendency for a greater 
percentage of errors to be made following a 
correct response in the multiple-choice than 
in the completion condition. The most likely 
reason for this result is that the test phase of 
the multiple-choice method involved expo- 
sure to incorrect responses, and these re- 
sponses may have subsequently interfered 
with stabilizing the correct responses. One 
implication of this interpretation is that 
when one uses multiple-choice questions to 
test how well a given passage has been 
learned, subsequent testing may produce 
errors because of interference attributable 
to the incorrect alternatives. The multiple- 
choice method, in other words, may not 
simply measure performance but may be 
detrimental to long-term retention. In effect, 
the multiple-choice procedure of the present 
experiment may be viewed as a modified 
retroactive inhibition paradigm in which the 
stem represents A and the possible responses 
as indicative of Associations A-B, A-C, A- 
D, and A-E. On four successive presenta- 
tions, there is a tendency for the subject to 
store the incorrect responses as well as the 
correct responses. When subsequent recall is 
required, the subject has trouble discrimi- 
nating correct from incorrect responses, es- 
pecially at the higher levels of acquisition. 
Because these responses are the last to be 
learned, they would, virtually by definition, 
involve the most diffieult questions. This 
interpretation is in general agreement with 
the findings of Myrow and Anderson 
(1972), who attributed the occurrence of 
retroactive inhibition in the multiple- 
choice, prose-learning situation to competi- 
tion of responses of original and interpo- 
lated learning. The present study, even 
though not constituting a basic retroactive 
inhibition paradigm, extends this finding in 
that the results suggest the testing proce- 
dure itself may be involved in producing 
interference, and such interference could 
quite conceivably be reflected in subsequent 
testing. 
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THEIR EFFECTS ON CROSS-RACE AND CROSS-SEX INTERACTION' 
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Although many public schools are nominally desegregated, the inter- 
action among students of varying racial and ethnic backgrounds is 
minimal. The present study evaluated the effects of two teaching 
techniques—student teams and instructional games—on the level of 
cross-race and cross-sex interaction in the classroom. Placing students 
on heterogeneous four-member student teams created significantly 
greater cross-race and cross-sex helping and friendship. Team success 
did not have the predicted positive effect on cross-race and cross-sex 
interaction. Playing the instructional game had a marginal effect on 
cross-race helping only; however, the game-team combination con- 
siderably increased the incidence of cross-race and cross-sex interaction 
over that of games alone. Katz’s theory of biracial work groups was 


evaluated in light of the present results. 


Even as many school systems are desegre- 
gating by altering the racial composition of 
their schools, social integration of minority 
groups remains minimal. Integrated educa- 
tion demands far more than changing the 
racial composition of schools and class- 
rooms; it demands establishing working re- 
lationships and understanding among the 
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students and between teachers and students 
(Georgeoff, Jones, Bahlke, & Howard, 1970; 
Wagoner, Glatt, & Gaines, 1970; Winnecoff 
& Kelly, 1971). Desegregated schools need to 
restructure the classroom in order to create 
more positive and constructive relationships 
among students from varying backgrounds. 
In the present study, the independent and 
combined effects of student teams and in- 
structional games on both cross-race and 
cross-sex interaction in the classroom are 
examined. 

There is considerable evidence that bar- 
riers to racial interaction exist among stu- 
dents in desegregated schools. Mann (1959), 
Webster (1960), St. John (1964), Gottlieb 
(1965), and Cusick (1973) found considera- 
ble within-race preference for both class- 
mates and friends by both black and white 
students. Such within-race preference has 
been substantiated for students from the 
sixth grade through college. As noted by 
both McPartland (1968) and Pettigrew 
(1969), the racial barriers are stronger for 
the more intimate relationships, such as 
friendship, than for formal task relation- 
ships. 

The existence of barriers to interaction 
among students across sex lines, particu- 
larly for children in early adolescence, has 
been frequently noted (Coleman, 1961; 
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Waller, 1967). Abel and Sahinkaya (1962) 
found strong within-sex preference even 
among four- and five-year-olds. Cross-sex 
relationships among junior high school stu- 
dents are particularly uncommon and, if 
they do occur, are often strained. Although 
barriers to eross-sex interaction have been 
commonly noted for early adolescents, the 
interpersonal dynamics of such barriers 
have not been explored. Because the dy- 
namics of racial barriers are more clearly 
delineated and at first glance appear to be 
similar to those for cross-sex interaction, 
the authors used the racial prejudice litera- 
ture for predicting structural effects on both 
cross-race and cross-sex interaction. 

Recent reviews of the race relations liter- 
ature (cf. Amir, 1969; Pettigrew, 1969) 
suggest reasons why merely creating deseg- 
regated classrooms is not sufficient for im- 
proving race relations among the students. 
One important condition for effective cross- 
race contact appears to be the creation of 
multiracial interdependencies among the 
representatives of the various racial or 
ethnic groups, One way to structure such in- 
terdependencies is the creation of biracial 
student work groups (or teams) in which 
similar rewards are administered to all 
teammates. The use of biracial student teams 
has frequently been suggested as a way to 
create greater interracial cooperation and 
acceptance (Allport, 1954; Gottlieb, 1965; 
Thelen, 1970). 

A variety of empirical studies support the 
use of group or team rewards as a means of 
improving race relations (cf. Amir, 1969; 
Katz, 1970). However, as noted by Katz 
(1970), the ex post facto nature of the ma- 
jority of the studies reviewed leaves unclear 
the causal direction of the team reward/at- 
titude change relationship. An examination 
of the experimental studies of biracial work 
teams reveals mixed results. Witte (1972) 
created biracial student teams in college 
classrooms and observed their effects on in- 
terracial acceptance and interaction. The 
biracial team condition consisted of students 
performing frequently on individual tasks. 
However, teammates’ scores were combined 
to form a team score. A team grade was cal- 
culated and assigned to all teammates. 
Team practice sessions, during which peer- 
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tutoring activities among teammates were 
allowed, constituted the only in-class cross- 
racial contact. The primary cross-racial 
bond was created by the reward interde- 
pendency. When compared to control con- 
dition scores (students performing individ- 
ually and being rewarded on the basis of 
individual performance), the biracial team J 
treatment created greater interracial peer 
tutoring, more racial acceptance (measured 
by several racial attitude scales), and less 
racial isolation (measured by a seating ag- 
gregation index). Witte's results can be in- 
terpreted as supporting Allport’s (1954) 
hypothesis that interracial acceptance can 
be increased if a common goal is created 
among teammates by making them cooper- 
atively dependent upon each other. 

Katz (1970) describes two experimental 
studies (Katz & Benjamin, 1960; Katz, 
Goldston, & Benjamin, 1958) which also ex- 
amined the effects of forming cooperative 
dependence across racial lines on interracial 
acceptance. In both studies, subjects were 
placed in four-member work groups (con- 
sisting of two black males and two white 
males). Subjects in all conditions were &8- 
signed a variety of tasks with a gro 
product required for the vast majority of 
the tasks. Cooperative dependence was exe: 
perimentally manipulated by varying the 
level of reward from the individual to the 
group. In the group-reward condition, sub- 
jects received a bonus payment based on the 
performance of the group as a whole. In the 
individual-reward condition, each subject 
received a bonus payment based on his pere 
formance alone. For both studies, the comz 
parison between the group- and individual- 
reward conditions revealed no systematic 
differences in interracial acceptance (as Te 
flected by communication patterns). 

The apparent contradiction between tl 
results of Katz and his associates (Kata ¢ 
Benjamin, 1960; Katz et al., 1958) and th 
of Witte (1972) is explainable. An examina 
tion of the treatment manipulation in thé 
studies of Katz suggests that a high level of 
cooperative dependence was created even W 
the individual-reward condition. Althoug 
students were rewarded individually, 
vast majority of the tasks were structul 
so as to require considerable collaborate 
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among the subjects. That is, resources re- 
quired to complete the task were shared by 
the group as a whole which created a mutual 
dependence, even though rewards were ulti- 
mately based on individual performance. In 
contrast, the control condition in the Witte 
(1972) study created no such interdepend- 
ence among students. Because the studies 
conducted by Katz confounded task with 
reward interdependence, they constitute 
weak tests of the effects of team reward on 
race relations. 

Because the present study uses groups 
similar to those of Witte and groups which 
meet most of Allport’s conditions for effec- 
tive racial interaction, the authors predict 
increased cross-race interaction when stu- 
dents are placed on biracial work teams. 
The authors postulate a similar effect on 
cross-sex interaction for students placed on 
teams containing both boys and girls. In 
addition, Amir (1969) has noted that for 
biracial work groups to be effective in re- 
ducing cross-racial barriers, the group must 
be operating under high-reward contingen- 
cies. That is, biracial groups that are highly 
successful at the task assigned to them 
should evidence greater dissolution of racial 
barriers than groups which fail at the task. 
This hypothesis is tested for both eross-race 
and cross-sex interaction in the present 
study. 

No studies have dealt with the effects of 
playing games on cross-race and cross-sex 
interaction and acceptance. A number of 
empirical studies (ef. Boocock & Schild, 
1968) reveal a positive effect of game play- 
ing on student attitudes and academic 
achievement. However, playing a learning 
game places students of different races 
and/or sexes in face-to-face competition, 
and both Allport’s (1954) model and the 
studies by Katz (1970) suggest that inter- 
racial contact in competitive settings may 
be aversive to the individuals. In short, 
games do force students of different races 
and of opposite sex to interact on a pleasant 
task; however, the interaction is competi- 
tive. Therefore, although the predicted effect 
of games on cross-race and cross-sex inter- 
action is positive, it is less than the predicted 
effect of teams because of the competitive 
nature of the interaction in a game setting. 
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Subjects 


The subjects were 110 seventh-grade students 
at a junior high school located in a major east- 
coast metropolitan area; 43% of the students were 
blacks and 47% were males. The students came 
from predominantly lower-middle-class families. 


Design 


The experiment was a 2 X 2 randomized design; 
the two factors were task (game versus quiz) and 
reward (team versus individual). Students were 
randomly assigned to the four treatment condi- 
tions. The experiment was conducted over a 4-week 
period, and it involved 20 school days. All groups 
met during the first period of the day. Two male 
and two female teachers participated in the study. 
All were in their first or second year of teaching. 
At the midpoint of the study, the four teachers 
were rotated; consequently, each treatment was 
taught by both a female and a male teacher. There 
was no significant nonrandom clustering by race 
or sex of students across treatment, conditions. The 
racial distribution ranged from 38% to 52% blacks, 
and sex distribution ranged from 41% to 50% 
males. 


Independent Variables 


The experiment manipulated two major vari- 
ables—academic task and reward structure. The 
academic task placed students in either a quiz or 
game-playing performance situation. The reward 
structure involved the administering of rewards to 
either individual students or teams of students. 

The weekly schedule of all treatment groups 
proceeded as follows: Students performed on the 
math task (game or quiz) for half of the period 
each Tuesday and for the entire period on Friday. 
Practice sessions were conducted for half of the 
period each Tuesday and Thursday ; these sessions 
were separate from the game and quiz activities. 
For the remainder of the week, the students re- 
ceived traditional classroom instruction. 

Academic task. One task level consisted of semi- 
weekly teacher-made math quizzes on material 
covered during the preceding days. For each quiz, 
a student was assigned a percentage score based on 
the number of problems answered correctly. The 
second level consisted of semiweekly game sessions 
with students performing on the instructional 
game, Equations (Allen, 1969). Within a classroom 
of 30 students, 10 Equations games were played 
simultaneously (3 students per game). The players 
at each table were grouped homogeneously on 
mathematics achievement. At the end of each game 
session, the 3 students at a table were ranked ac- 
cording to their performance. The winner received 
6 points, the middle scorer 4 points, and the low 
scorer 2 points. Students who were high scorers 
were “bumped” up to a higher achievement table 
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for the next game-playing session. Low scorers were 
assigned to a lower achievement table for the fol- 
lowing session. Such bumping maintained homoge- 
neous game tables while taking into account new 
learning (as reflected in a student winning or los- 
ing). 

Reward. structure. Both reward struetures used 
a classroom newsletter as a means of providing 
publie reinforcement for performance on either the 
quiz or game. The newsletters were handed out to 
all students in each class on the day following 
every game session or quiz. The individual-reward 
condition involved the ranking of scores (based on 
either game or quiz performance) of individual 
students on the newsletter. Students who scored 
particularly high or who evidenced substantial im- 
provement in their scores were singled out for 
special attention. For the team-reward condition, 
the newsletter listed the preceding day's scores for 
each team (first page) and for each student (sec- 
ond page). The team scores were ranked, and par- 
ticular praise was given to the top teams or teams 
moving up rapidly in the rankings. The student 
teams consisted of four members and were designed 
with the purpose of creating maximal intrateam 
heterogeneity (on race, sex, and achievement) and 
interteam equality. Teammates were assigned ad- 
jacent seats and were allowed to work together 
during practice sessions. However, each team 
member did perform individually in both the game 
and quiz classes. The team score listed on the 
newsletter was obtained by summing the scores 
of all members present for the game or test. Teams 
with absent members were thus penalized, and no 
provisions were made for make-ups. 

At no point during the study were the teachers 
or students made aware of the cross-race or cross- 
sex hypothesis nor was such interaction overtly 
encouraged in any of the groups. 


Dependent Variables 


Three sociometric items were included in a post- 
test questionnaire in order to assess the level and 
direction of student helping and friendship pat- 
terns, The students were asked to give the names 
of classmates (a) “whom you helped,” (b) “who 
helped you,” and (c) “who are your friends,” Eight 
blank lines were allotted for responses to each of 
the three questions. 

An examination of the interstudent agreement 
for the friendship choices revealed 61% mutual 
choices. A similar examination of the you helped/ 
helped you items revealed 46% agreement. The 
slight reduction in interstudent agreement on the 
helping relationship (as compared to friendship) 
is due to less agreement within the game classes. 
Students in the quiz classes agree about helping 
relationships almost twice as often (62%) as those 
in the game classes (33%). The difference is sta- 
tistically significant (x* = 28.08, df = 1, p < 001). 
The nature of the student interaction in the game 
condition may have made it difficult for students 
to assess helping patterns. 
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Hypotheses 


Of interest in the present study are the effects 
of the task and reward factors on the number of 
cross-race and cross-sex selections (over all selec- 
tions) on both helping and friendship dimensions. 
It is hypothesized that both the game and team 
factors increase cross-race and cross-sex interaction, 
with the team effect being the larger of the two. 

The effect of level of reward on cross-race and 
eross-aex interaction within the two team condi- 
tions was also of interest. Clearly, in the classroom 
setting of this experiment it was neither desirable 
nor feasible to manipulate the level of rewards re- 
ceived by the team. Level of reward was made con- 
tingent on team performance. Consequently, the 
low- and high-reward conditions were determined 
in a post hoc manner. The two teams in each group 
which won most consistently were classified as 
high reward, and the two teams which were con- 
sistently last were classified as low reward. While 
the authors recognize that this procedure departs 
from the approach typically used in experimental 
social psychology, it is more consistent with behav- 
ior in actual classrooms. 


REsuLts 


The significance of the task, reward, and 
interaction effects were tested using the log- 
linear model for hypothesis testing in multi- 
dimensional contingency tables (Goodman, 
1970). The log-linear model for contingency 
tables is analogous to the linear model for 
analysis of variance. The main and inter- 
active effects are defined as linear functions 
of the logarithms of cell frequencies in the 
same manner as such effects are defined by 
linear functions of cell means in analysis of 
variance. When the total N of the contin- 
gency table is large, the variance of the ef- 
fects is estimated by the sum of the recipro- 
cals of cell frequencies. The ratio of the 
effect to the square root of its variance (re- 
ferred to in the present paper as a Z ratio) 
follows an asymptotic normal distribution 
and is the statistic used to test the signifi- 
cance of the effects. One-tailed tests of sig- 
nificance were employed for variables ana- 
lyzed with the log-linear model (see 
Goodman, 1969, 1970; Shaffer, 1973, for 
more detailed discussions of the model). 

The dependent variables of interest are 
the percentages of cross-race and cross-sex 
interaction for helping, being helped, and 
friendship. Analyses were conducted sepa- 
rately for the Race X Sex interaction dimen- 
sions. 
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Cross-Race Interaction 


The log-linear model was used to examine 
the effects of task, reward, and the Task x 
Reward interaction on the level of cross- 
race interaction. A significant task effect 
was noted for the “helped you” variable only 
(Z = 1.68, p < .05). Significant reward ef- 
fects were observed for “you helped” (Z = 
1.70, p < .05) and “helped you” (Z = 2.41, 
p < .01), and a marginally significant effect 
was detected for friendship (Z = 1.60, 
p < .10). Teams resulted in greater cross- 
race interaction for all three sociometric 
measures. No significant Task x Reward 
interaction effects were obtained. 

The treatment cell percentages for cross- 
race helping and friendship are contained in 
Table 1. The level of cross-race selection for 
the individual-quiz treatment can be treated 
as a measure of interracial interaction in a 
traditional class. The table indicates 33% 
cross-racial selection for “you helped,” 20% 
for “helped you,” and 31% for “friends.” A 
definite preference by students for friends 
and helpmates of the same race existed in 
the traditional class. The significant, posi- 
tive team effect can be noted by comparing 
the percentages of the top two rows with 
those in the bottom two rows. 

The effect of level of team reward on 
cross-race interaction was examined by 
comparing the percentages of cross-race 
Selections for the high-reward teams to 
those of the low-reward teams. Level of 
reward was defined by success in the tour- 
nament. The chi-square test for difference 
in proportions was used (Ferguson, 1966). 
None of the three chi-squares computed 
was statistically significant (p < .05). An 
examination of the cell percentages reveals 
a trend in a direction opposite of that pre- 
dicted (you helped: low reward = 55%, 
high reward = 39%; helped you: low re- 
ward = 53%, high reward = 35%; friends: 
low reward = 44%, high reward = 33%); 
that is, low-success teams appeared to create 
More, not less, cross-race selections on all 
three dependent variables. 


Cross-Sex Interaction 


The log-linear model applied to the cross- 
Sex selection indicated no significant task 
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TABLE 1 


PERCENTAGE OF Cross-Race SELECTIONS FOR THE 
Four TREATMENT CONDITIONS 


Treatment You helped | Helped you Friends 
Individual 

Quiz 33 (15) 20 (20) | 31 (131) 

Game 20 (30) | 29 (49) | 27 (108) 
Team 

Quiz 38 (60) 34 (61) | 37 (159) 

Game 44 (52) 54 (56) | 34 (147) 


Note. Numbers in parentheses indicate total 
number of student selections. 


effects. Significant reward effects were noted 
for all three sociometrie variables (you 
helped: Z = 2.61, p < .01; helped you: 
Z = 3.09, p < .01; friends: Z = 288, p < 
01). A significant Task x Reward inter- 
action was observed for the “helped you” 
variable only (Z = 2.29, p < .05). Table 2 
contains the percentage of cross-sex selec- 
tions for the four treatment conditions. The 
traditional class (individual quiz) has mini- 
mal cross-sex interaction (you helped = 
13%, helped you = 10%, friends = 21%). 
The significant, positive team effect can be 
detected by comparing the percentages in 
the top two rows with those in the bottom 
two rows. 

Figure 1 delineates the significant inter- 
action effect noted for the “helped you” 
dimension. It appears that adding games to 
the curriculum had a positive effect on the 
level of cross-sex helping for the individual- 
reward condition and a slightly negative ef- 
fect for the team-reward condition. 

The effect of level of team success on 
cross-sex selection was also assessed and re- 


TABLE 2 


PERCENTAGE OF CRoss-SEX SELECTIONS FOR THE 
Four TREATMENT CONDITIONS 


Treatment Youhelped | Helped you Friends 
Individual 

Quiz 13 (15) 10 (20) | 21 (181) 

Game 27 (30) 33 (49) | 17 (108) 
Team 

Quiz 43 (60) 49 (61) | 33 (159) 

Game 46 (532) | 41 (56) | 27 (147) 


Note. Numbers in parentheses indicate total 
number of student selections. 
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Ficure 1. The Teams X Games interaction for 
cross-sex selections. 


vealed a marginally significant effect for 
the “you helped” variable only (x? = 2.93, 
df = 1,p < .10). The cell percentages were 
as follows: You helped: low reward = 52%, 
high reward = 30%; helped you: low re- 
ward = 47%, high reward = 45%; friends: 
low reward = 35%, high reward = 27%. 
The effect on cross-sex interaction for the 
“you helped” variable was in the opposite 
direction of that predicted. 


Summary of Results 


The results indicate a positive impact of 
team rewards on cross-sex selections in the 
classroom for both helping and friendship 
interactions. A positive team effect was also 
noted for cross-race selections, although the 
effect was limited to helping or on-task in- 
teractions. The instructional game, particu- 
larly when used without student teams, had 
a much less pronounced effect than did 
teams. For cross-sex selections the game had 
no effects, and for cross-race selections a 
positive effect was noted only for the 
“helped you” dimension. Team success did 
not influence the level of either cross-sex 
or cross-race selection. 


Discussion 


The results indicate that administering 
team rewards to heterogeneous (on both 
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race and sex) groups of students helps re- 
duce race and sex barriers inhibiting student 
interaction. The results partially support 
those of Witte (1972) in which group re- 
wards increased interracial interaction and 
acceptance. The team effect on reducing 
cross-racial barriers was particularly potent 
for the helping relationships. Only minimal 
generalization of the team effect to friend- 
ship dimensions was detected, calling into 
question whether or not basic changes in 
racial attitudes were created. For the cross- 
sex dimension, the positive effect of group 
reward on cross-sex helping behavior gen- 
eralized to the friendship dimension. 

The success of the team rewards in re- 
ducing barriers to social interaction may be 
due to the following factors. First, the group 
composition was uniquely heterogeneous. 
The group composition of the vast majority 
of the teams consisted of two blacks (one 
male and one female) and two whites (one 
male and one female). Perhaps forming such 
a heterogeneous group of students in the 
classroom, placing the students in adjacent 
seats, and allowing them to work together 
(without administering any group reward) 
would have had an effect on race relations 
similar to that observed in the present study. 
It would have been useful to implement an 
additional condition involving heterogene- 
ous groups (but with individual rewards) 
in the current study. As it now stands, it is 
unclear whether the team-reward effect is 
due to the team reward or to some combina- 
tion of the team-reward and group-composi- 
tion factors. 

A second factor in explaining the positive 
team-reward effect lies in the likely impact 
of the mutual interdependence (among 
teammates) on group processes. The reward 
interdependence created at the team level 
fosters interaction among teammates. Be- 
cause the teammates share control over each 
other's fate in the classroom, it is to their 
advantage to collaborate with one another 
and to foster each other’s progress in the 
class. In an analysis of additional data from 
the present study (DeVries & Edwards, 
1973), observations of team practice sessions 
revealed that substantial amounts of within- 
team peer tutoring occurred in the team- 
reward classes, Students in the team classes 
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also reported greater mutual concern among 
their classmates than did those in the indi- 
vidual-reward condition (DeVries & Ed- 
wards, 1973). In short, team rewards ap- 
peared to alter the nature and extent of 
interpersonal relationships among the stu- 
dents. Because the teams formed in the 
present study were uniquely heterogeneous, 
such extended interpersonal relationships 
oceurred across racial and sex lines. 

An important distinction in the opera- 
tionalization of student teams in the present 
study as compared to those of Katz et al. 
(1958) and others should be noted. Tradi- 
tionally, biracial student teams involved 
teammates working on a group task. For 
example, a group of students might be asked 
to solve a specific human relations problem, 
arriving at a group solution. The group's 
solution would then be rated, with each 
teammate receiving the group grade. In the 
current study, each teammate, acting as a 
representative of his team, performed indi- 
vidually on either the quiz or game. Within- 
team interaction was not demanded, al- 
though time was allotted for semiweekly 
practice sessions during which teammates 
could interact. The use of individual per- 
formance. settings, although justifiable on a 
pedagogical basis (it forces a higher level 
of accountability of student performance), 
probably had the effect of diluting the team- 
reward effect on cross-race and cross-sex 
interaction. Subsequent research by the au- 
thors will investigate the effect of team re- 
wards on reducing social barriers using (a) 
individual performance by teammates and 
(b) group performance by the entire team. 

Amir’s (1969) hypothesis that strong re- 
wards must be administered to the team if 
racial barriers are to be dissolved was not 
supported. Although the evidence is only 
suggestive, teams which consistently lost in 
the competition appeared to have slightly 
greater cross-race and cross-sex helping and 
friendship relationships than those which 
won, Perhaps failure at a task by a team 
forces the individuals to realize the limits 
of their own resources and the importance 
of cooperating with their teammates. In con- 
trast, individuals on teams which are con- 
sistently successful are not confronted with 
the need for help from others and thus are 
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not forced to break down racial and sex bar- 
riers. 

Unfortunately, because of the post hoc 
classification of the teams in the present 
study, the observed results might also be 
due to a different causal chain. More spe- 
cifically, teams which attempted greater 
eross-race and cross-sex interaction may 
have used up considerable time and energy 
on such activities, leaving little time for 
actual preparation for the tournament. In 
contrast, teams which did not attempt to 
dissolve the barriers may have rather spent 
their time in individual study preparing for 
the tournament; that is, forming social re- 
lationships within a team may be a costly 
process which detracts from the academic 
performance of the teams. In short, although 
Amir’s hypothesis was not supported, more 
rigorous tests of the reward hypothesis are 
required before a definitive answer can be 
achieved. 

In seeking an explanation for the lack of 
a significant game effect on reducing race 
and sex barriers, one possible reason might 
lie in the interaction defined by the games. 
It is possible that few of the three-person 
groups at the game tables were actually bi- 
racial. An examination of the racial com- 
position at the game tables over the entire 
experimental period revealed that, on the 
average, 78% (range from 71% to 84%) of 
the students were playing at racially mixed 
tables on a given day. In short, it appears 
that the game treatment did create many 
biracial interaction situations but that. such 
situations did not foster greater interracial 


- friendships or helping relationships. 


The lack of a significant game main effect 
is possibly due to the student competition 
created by the game. Although students in 
the Equations classes encounter students of 
the opposite sex and different races at the 
game tables, they do so in the context of 
meeting an "opponent." The outcome of 
such an encounter is always a ranking of 
the individuals on some status dimension. 
Students playing the game are not likely to 
go out of their way to assist their fellow 
players (their opponents). Additionally, the 
games-individual treatment involves small 
game-table groups whose membership con- 
stantly changes. Even if two students de- 
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sired to become friends, the two would soon 
be separated because each day they face 
different opponents. Because Equations 
(without teams) creates competition among 
students and because the composition of the 
groups changes daily, it fails to overcome 
race and sex barriers in the classroom. How- 
ever, when teams are added to the game 
structure (as advocated by its designer, 
Layman Allen), the combination facilitates 
both cross-race and cross-sex interaction. 

Work by Katz (1970) on biracial groups 
indicates it may be important to examine 
the team effects for black and white students 
separately. Katz views biracial groups as 
combining a high-status group (whites) and 
a low-status group (blacks). Given this the- 
ory, one would predict different reactions 
from whites and blacks to being on a bi- 
racial team. Whites as the high-status group 
might be threatened by the inclusion of 
blacks, particularly when included in their 
friendship groups. An examination of the 
cross-race data for the team-condition stu- 
dents revealed no significant differences (use 
of chi-square test) between the two racial 
groups in percentage of cross-racial selection 
for the three sociometrie variables (you 
helped: blacks = 49%, whites = 35% ; 
helped you: blacks = 38%, whites = 48% ; 
friends: blacks = 44%, whites = 33%). 
The data suggest that the biracial interac- 
tion in the teams classes was not one of a 
low-status group trying desperately to gain 
entry into a high-status group (as suggested 
by Katz, 1970) but rather of two equal- 
status groups mutually sharing needed re- 
sources. 

The current study suggests that relations 
between racial and sex groups can be im- 
proved in the classroom if the teacher re- 
structures the reward system. Administering 
group rewards (to heterogeneous student 
groups) represents a classroom technique 
readily accessible to teachers across both 
grade levels and subject matter. Recent 
studies of team rewards in classrooms 
(Hamblin, Buckholdt, Ferritor, Kozloff, & 
Blackwell, 1971; DeVries & Edwards, 1973; 
Witte, 1972; Wodarski, Hamblin, Buck- 
holdt, & Ferritor, 1971) suggest that such 
rewards create high levels of student peer 
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tutoring and, in turn, increase general aca- 
demic achievement. This study, in addition, 
suggests such tutoring can take place be- 
tween students of different races and of 
opposite sexes, thus contributing to the aca- 
demic achievement of the students involved: 
and advancing the integration process in 
desegregated schools. 

The study leaves unanswered several 
questions concerning the generalization of 
the observed effects on social interaction of 
minority groups in the classroom. First, | 
there is the issue of whether the observed. 
experimental effects on overt behavior 
(using self-report measures) generalized to 
the affective level. The marginal effect of 
team rewards on cross-race friendships suge 
gests little attitudinal change. In contrast, 
the eross-sex interaction increased for both 
the formal task-related dimensions (helps 
ing) and the more intimate social dimen- 
sions (friendships). Whether such general- 
ization failed to occur even for cross-sex. 
interaction could only be answered by more 
direct attitudinal measures. A second une 
answered question concerns the generali 
tion of the treatment effects over time. Al- 
though it is unlikely that the increased. 
cross-race and cross-sex interaction would 
persist once the team-reward structures were 
dropped, it is at this point an open question 
subject to future investigation. q 

A question raised earlier in the paper was 
whether structural variations which ha 
been posited as reducing natural barriers 
interaction across racial lines would act 
similarily on barriers across sex lines. The 
results of the present study suggest thal 
they do. Placing students on teams char- 
acterized by a diversity of personal char- 
acteristics of teammates, a common team 
goal, and equal-status positions for all 
teammates not only increases cross-race in- 
teraction but has even greater positive eff 
On cross-sex interaction. Both natural bi 
riers (race and sex) appear to operate under 
similar dynamics. 

One possible limitation of the cur! 
results is the unique racial and sex distribu- 
tions within all classes of the present €x- 
periment. All classes were characterized 
an almost 50-50 split along both race aní 
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sex lines. If, for example, only 20% of the 
students in a class were one race or one sex, 
such groups would truly represent a minor- 
ity in the classroom. In such classrooms, the 
heterogeneous student teams might not 
create a major reduction in racial or sex 
barriers as observed in the present study. 
Answers to questions such as these await 
further research. 
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The researchers studied the influence of two types of verbal feedback 
on changes in teacher perceptions and behavior. The experiment was 
conducted in a microteaching-type laboratory setting in which 36 ex- 
perienced junior high school teachers taught the same content to two 
groups. Feedback between sessions consisted of information about (a) 
discrepancy between stated intent (subject's estimate of how he or she 
would teach) and observed behavior and (b) student learning out- 
comes. To isolate feedback effects, prompts concerning desired be- 
havior and other training dimensions were removed from treatments, 
Teacher behavior was coded using interaction analysis categories. 
Analysis of variance indicated that feedback treatments were asso- 
cated with significant changes in teacher intent (p < 05). No signifi- 
cant effects were observed for teacher behavior or for the interaction 
of treatments. Results suggest that knowledge of student learning out- 
comes and the reduction of intent-action discrepancy have minimal 


impact on teacher decisions to modify classroom procedures. 


The development and use of systematic 
observation procedures in teacher training 
has focused attention on feedback processes 
and their effects on teacher performance. A 
cursory glance at recent literature indicates 
that feedback experiences have become a 
standard component of proposals for 
teacher education reform (e.g., Smith, 1969) . 
Research on feedback effects, however, has 
often lacked adequate experimental controls 
and has been limited to a narrow range of 
feedback content. Moreover, the processes 
assumed to mediate teacher change from 
feedback have seldom been examined di- 
rectly. 

The present study was designed to in- 
vestigate, under controlled conditions, the 
effects of feedback on the perceptions and 
behavior of experienced teachers. In con- 


*The authors wish to thank R, H. Metzcus for 
assistance in the conduct of the research and J. 
Walter for help in preparing the report. ; 

* Requests for reprints should be sent to Walter 
Doyle, who is now at the College of Education. 
Sd Texas State University, Denton, Texas 


trast to most studies in this area, content 
was held constant across teaching trials, and 
feedback concerning both classroom be- 
havior processes (perceived and observed) 
and student learning outcomes was provided 
between sessions, Finally, research questions 
were formulated in terms of discrepancy re- 
duction, a process frequently assumed to 
operate in teacher reaction to feedback. 

Research in the interaction analysis tradi- 
tion (e.g., Amidon & Hough, 1967; Flanders, 
1970) has generally been interpreted as in- 
dicating that coded feedback concerning 
classroom processes is an effective means for 
modifying teacher behavior. Flanders (1963, 
1964, 1965) has suggested that these feed- 
back effects can be explained in terms of 
teacher reduction of intent-action diserep- 
ancy. That is, interaction analysis feedback, 
offered in an objective and nonthreatening 
manner, allows a teacher “to compare his 
actions with his intentions" on the as- 
sumption that “any reduction of the discrep- 
aney between what a teacher intends to do 
and what he actually does is likely to result 
in an improvement of instruction [Flanders 
1965, p. 13]." 
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Unfortunately, existing interaction analy- 
sis research does not permit firm conclusions 
about the effects of feedback or of intent- 
action discrepancy reduction. Studies in the 
Flanders’ tradition have typically employed 
“training in interaction analysis,” a treat- 
ment which, in addition to feedback, in- 
eludes discrimination training, reinforce- 
ment, practice, and strong prompting in 
favor of “indirect” teaching behaviors. It is 
difficult, therefore, to separate the effects of 
feedback and intent-action discrepancy re- 
duction from the effects of other treatment 
components, especially the learning of an 
approved set of teaching behaviors. 

In one of the few studies of discrepancy 
reduction in teacher behavior change, Tuck- 
man, McCall, and Hyman (1969) made an 
effort to remove bias toward indirect be- 
havior from interaction analysis training 
materials and to control for the magnitude 
of initial discrepancy between self-perceived 
and observed teaching behavior. They 
found, however, that high-discrepancy and 
low-discrepaney teachers did not change be- 
havior differentially. Rather, discrepancy 
reduction was associated with a modifi- 
Cation of self-perception toward greater 
congruence with observed performance. 
These results suggest that, in the absence 
of an external model, reduction of discrep- 
ancy induced by process feedback results in 
a modification of personal intentions rather 
than teaching behavior. 

Surprisingly, feedback research has con- 
centrated almost exclusively on the effects 
of process content. It would seem, however, 
that a teacher’s decision to change behavior 
Operates within a matrix of information in- 
puts, of which process feedback represents 
only one dimension. Presumably a teacher 
evaluates a procedure also in terms of its 
Success,” that is, its impact on student 


, learning outeomes. Such an evaluation 


Would seem to influence a decision to modify 

ehavior. Moreover, if a teacher is con- 
fronted with information about student 
learning outcomes and classroom processes, 
the nature and degree of change would seem 
to be a function of possible interactions be- 


tween these two sources or types of feed- 
back. 
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From a discrepancy reduction viewpoint, 
studies of teacher attribution, a process 
whereby persons assign causal responsibility 
for events and outcomes in an environment 
(Heider, 1958), have implications for 
formulating research questions about the 
effects of student performance feedback. In 
simulated teaching situations, Johnson, 
Feigenbaum, and Weiby (1964) and Beck- 
man (1970) found that teachers accepted 
personal credit for student success (attribu- 
tion to self) and assigned responsibility to 
outside factors for student failure (external 
attribution). In these studies, no attempt 
was made to relate attribution of causality 
for student performance to changes in per- 
ceived or observed teaching behavior. The 
results suggest, however, that insofar as stu- 
dent failure induces a discrepancy between 
intentions and outcomes, teachers resolve 
this discrepancy in a manner which does not 
necessarily involve behavior change. That is, 
if a teacher perceives himself responsible 
only for those methods that are successful, 
then there is little need to modify his per- 
formance when confronted with student 
failure. 

The purpose of the present study was to 
examine the effects of verbal feedback con- 
cerning (a) student learning outcomes and 
(b) the discrepancy between self-reported 
intent and observed action. Consistent with 
previous research on discrepancy reduction, 
effects of feedback were measured in re- 
lation to teacher changes in both percep- 
tions of how they would teach (intent) and 
actual classroom performance as measured 
by independent observers. Teachers receiv- 
ing feedback that their students were not 
successful were expected to change behavior 
to a greater extent than teachers who were 
successful. However, differential changes 
resulting from interaction effects were pre- 
dicted in two instances: (a) Teachers told 
that their actions did not match their in- 
tentions but that their students were success- 
ful were expected to modify intent; and 
(b) teachers told that their actions did not 
match their intentions and that their stu- 
dents were not successful were expected to 
change behavior. 
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Subjects 


The sample consisted of 36 junior high school 
teachers who volunteered to participate in an ex- 
periment on microteaching. Subjects included 12 
female and 24 male teachers representing language 
arts, math, science, social studies, art, health, and 
occupational education. Subjects averaged 7.25 
years of teaching experience. 


Procedure 


The experiment was conducted in a microteach- 
ing-type laboratory setting in which subjects 
taught two 10-minute sessions to different groups 
of four students selected, within practical limits, on 
a random schedule. A 3 X 2 factorial design per- 
mitted the manipulation of intent-action discrep- 
ancy (high-low) and student performance (suc- 
cess-nonsuccess-control) feedback conditions. 

Teaching content consisted of six Greek pre- 
fixes and suffixes and their English meaning (e.g., 
tele—far). On the day before they were to teach, 
subjects were introduced to the teaching task and 
were asked to complete an intent inventory. This 
intent measure was derived from the 10 categories 
of interaction analysis (Flanders, 1970) and re- 
quired that each subject estimate the percentage of 
time he expected to devote to various classroom 
behaviors during the teaching session. 

Following the first teaching session, subjects 
were told to wait while teaching behavior records 
were calculated and students were tested on the 
material just taught. After an appropriate interval 
(approximately 10 minutes), subjects received 
feedback regarding intent-action discrepancy and 
student test performance. Subjects were then given 
a second teaching task consisting of a different but 
equivalent set of Greek words and English mean- 
ings. Subjects also completed the intent inventory 
for the second teaching session. An observer was 
present during both teaching sessions, and audio 
recordings of each session were made. The teach- 
feedback-reteach cycle was completed in approxi- 
mately 40 minutes. 

Feedback was administered in both written and 
oral form by a research assistant unfamiliar with 
the nature of the experiment. Feedback content 
consisted first of a summary of observed teaching 
behavior (calculated in minutes) presented in con- 
junction with the subject’s stated intent as mea- 
sured prior to the teaching session. Subjects in the 
success group were also informed that the mean 
score for their students on the postsession test 

as "significantly above the average as previously 
established for this test." Subjects in the non- 
success group were told that their students scored 
below the average for the test, and control subjects 
received no student performance feedback. In order 
to avoid sensitizing subjects to a particular pattern 
of teaching behavior, no further discussion or 
elaboration of this information was permitted dur- 
ing the feedback session. 
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The mode of feedback was designed to corre- 
spond closely to the “verbal feedback” treatment 
used by Tuckman et al. (1969). These researchers 
found that verbal feedback from an independent 
observer, in contrast to interaction analysis train- 
ing (without feedback) and to audiotape playback, 
was the only treatment associated with a signifi- 
cant change in teacher behavior. This feedback 
mode was selected for the present study because it 
appeared to be a potentially effective treatment 
which was not dependent upon prior familiarity 
with interaction analysis and which contained mini- 
mal external cues concerning “approved” teaching 
behaviors. Interviews with subjects used in the 
present research indicated that they had received 
no prior interaction analysis training or feedback. 


Assignment to Treatments 


Since the present study focused on the effects of — 
the discrepancy reduction process itself, rather 
than on the direction of teacher change, all sub- 
jects received the same classroom behavior feed- 
back content. The use of standard feedback con- 
tent also served to equalize treatment conditions 
in order to control for a possible source of vari- 
ation. For purposes of the study, classroom be- 
havior feedback content was derived from com- 
posite matrix data provided by Amidon and 
Flanders (1963) and consisted of 85% total talk, 
57% total teacher talk, 28% student talk, 15% in- 
direct teacher influence, and 42% direct teacher 
influence. Subjects were assigned to intent-action 
discrepancy feedback conditions according to their 
“discrepancy scores,” that is, the differences be- 
tween intent and the process feedback they were 
to receive. Discrepancy scores were computed from 
the absolute value of the difference between sta! 
intent as measured prior to the first teaching ses- 
sion and the standard feedback content selected for 
the study. Subjects with discrepancy scores above 
the median were assigned to the high-discrepancy 
group and those below the median to the low- 
discrepancy group. Subjects within each discrep- 
ancy group were then randomly assigned to the 
three student performance feedback treatments: 
success, nonsuccess, and control? ] 

Because of the element of deception involved in 
both feedback treatments, each subject was in- 
formed about the nature of the study immediately 
following the second teaching session. Responses to 
inquires during these debriefing sessions indica 
that subjects did not doubt the validity of the 
feedback content they received. 


"In assigning subjects to conditions, discrepancy 
level was treated as a moderator variable (Tuck- 
man, 1972, pp. 41-43, 110-113). As a result, sub- 
jects were not randomly assigned to discrepancy 
levels, but “within each level on the moderator 
variable,” subjects were randomly assigned to stu- 
dent performance feedback treatments “to insure 
that all other selection characteristics... were Tan- 
ane across conditions...” (Tuckman, 1972, P- 
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TABLE 1 
ANALYSIS OF VARIANCE OF BEHAVIOR SCORE CHANGE (| B» — Bı |) BY TREATMENTS 
AND Hicn-Low DISCREPANCY 
Teacher talk 
Total talk Student talk. 
Source 4f Total Direct Indirect 
MS F MS F MS F MS F F 
Treatments 2 |19.60 | 1.12 | 23.09 | 1.56 | 50.88 | 1.37 | 12.89 | .65 43 
High-low discrepancy 1 | 2.89] .17| 9.51] .64] 13.45] .36| 9.82] .50 387 
Interaction 2 |25.77 | 1.47 | 25.42 | 1.72 | 14.79 | .40| 1.42] .07 18 
Within 30 |17.48 14.77 38.52 19.70 


Dependent Variables 


Dependent variables consisted of two categories: 
changes in teacher intent and changes in teaching 
behavior. Changes were scored in five areas: total 
talk, total teacher talk, student talk, direct in- 
fluence, and indirect influence. Intent changes 
were derived by computing the absolute values of 
the difference between pretreatment intent in- 
ventory scores and posttreatment intent scores. 
Behavior changes were computed from the abso- 
lute value of the difference between actual teach- 
ing behavior in the two teaching sessions. Actual 
teaching behavior was coded from audiotapes by 
two trained observers using the Flanders system 
of interaction analysis. Mean interobserver agree- 
ment over the 72 sessions, using the Scott proce- 


' dure (see Flanders, 1967), was 81.7. 


RESULTS 


Change in Teaching Behavior 


Table 1 summarizes analysis of variance 
results indicating the effects of student per- 
formance feedback treatments and discrep- 
ancy conditions on the five dependent mea- 
sures of teaching behavior change. There 
were no statistically significant effects for 
discrepancy level or for treatments. No 
significant effects were found for the inter- 


action of treatments and discrepancy con- 
ditions. 


Change in Teacher Intent 


Table 2 summarizes analysis of variance 
results concerning the effects of student per- 
formance feedback treatments and discrep- 
ancy conditions on the five measures of 
nep intent change. The data indicate 

at discrepancy conditions had a signifi- 
mo effect on the amount of intent change 

the areas of direct influence (p < .05) 
and indirect influence (p < .05). A compari- 


son of means for the two groups revealed 
that high-discrepancy subjects (X = 27.44 
for direct influence and X = 59.78 for in- 
direct influence) changed more than low- 
discrepaney subjects (X — 12.61 for direct 
and X — 17.50 for indirect) on these two 
measures of teacher intent. The data further 
indicate that there was no significant effect 
on intent change for student performance 
feedback treatments and no significant in- 
teractions between treatments and discrep- 
ancy conditions. 

Because of possible selection bias intro- 
duced by the method of assigning subjects 
to treatments (see Footnote 3), chi-square 
analyses were conducted on the distribution 
of sex and experience across discrepancy 
levels. Results for sex were not significant 
(e = 1.13, df = 1,p > .20) * Results for ex- 
perience indicate a moderate trend (y? = 
292, df = 1, p < .10)* toward an inverse 
relation between these two variables, that is, 
high-experience subjects were assigned to 
the low-diserepancy level and low-experi- 
ence subjects to the high-discrepancy level 
(see Table 3). 


Discussion 


Under the conditions of the present study, 
feedback concerning student learning out- 
comes did not have a significant impact on 
the modification of teacher perceptions of 
how they would teach or on actual class- 
room performance. Moreover, the inter- 
action of student performance and class- 


‘Chi-square calculations were based on Tuck- 
man’s (1972, pp. 248-249) formula which incor- 
porates a correction for continuity. 
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TABLE 2 
ANALYSIS OF VARIANCE OF INTENT SCORE CHANGE (|I: — I; |) BY TREATMENTS 
or Hregu-Low DISCREPANCY 
Teacher talk 
Total talk Student talk. 
Source df Total Direct. Indirect. 
MS R MS F MS F MS F MS F 
Treatments 2 | 384.03| .66 | 289.58 | .88 32.86 | .12 870.53 | .29 | 75.69 47 
High-low dis- UD 
pou 1 [1,007.36 | 2.86 | 802.78 | 2.45 |1,980.25 | 7.34* |16,086.69 | 5.34* 367.36 | 2.29 
Interaction 2| 213.18 | .37 | 234.02 | .71| 354.25 | 1.31 | 1,913.50 | .63 17 -36 1 
Within 30 | 583.19 327.50 269.62 3,013.78 160.69 
*p< .05. 


room process feedback did not produce dif- 
ferential changes in either teacher intent or 
behavior. Resolution of intent-action dis- 
crepancy was, however, associated with a 
modification of teacher intent in the areas 
of direct and indirect influence. These re- 
sults suggest that teachers, when asked to 
reteach the same content, react to intent- 
action discrepancy and student performance 
feedback by revising their intentions of how 
they will teach rather than their classroom 
behavior, regardless of whether or not these 
procedures were effective. 

Present findings must be interpreted, 
however, in light of possible limitations in- 
herent in the design. First, the experimental 
setting, although allowing for control of ex- 
traneous variables, may have been suffi- 
ciently artificial to minimize potential for 
behavior change. Consistency with results 
obtained under more natural conditions 
(e.g., Tuckman et al., 1969) would seem, 
however, to support the validity of the 


TABLE 3 


DISTRIBUTION OF SUBJECTS By EXPERIENCE 
ACROSS DISCREPANCY LEVELS 


Experience* 
Discrepancy level a eek 
High Low 
High 4 14 
Low 10 8 


* Subjects with years of experience above X — 
7.25 were classified as high experience ; those be- 
low the mean as low experience. 


present findings. A second limitation con- 
cerns possible selection bias introduced by 
the method of assigning teachers to treat- 
ments. The analysis indicates a trend to- 
ward differential experience levels across 
high- and low-discrepancy conditions. Al- 
though experience may be associated with 
receptivity to feedback, existing research 
has not confirmed such a relationship 
(Tuckman & Oliver, 1968). Moreover, it is 
unlikely that selection bias would have re- 
duced potential for behavior change or 
would have been associated with systematic 
changes in teacher intent as indicated in the 
present data. It would seem, therefore, that 
selection bias is not a serious threat to the 
utility of the findings. 

Within these limitations, results of the 
present study suggest that the personal in- 
tentions of experienced teachers do not serve 
as performance models of sufficient power to 
change classroom behavior. In the absence 
of an externally validated and supported 
model of approved behavior, verbal feed- 
back would appear to affect perceptions 
tather than behavior. This interpretation 
implies that the modification of teacher be- 
havior, in contrast to perceptions, is de- 
pendent on contingencies other than those 
provided by verbal feedback. Contingency 
analysis, along the lines suggested by Me- 
Donald (1973), would seem to represent à 
Profitable direction for clarifying teacher 
behavior change research. ' 

Implicit in these findings is the notion 
that knowledge of student learning out- 
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comes has little impact on teacher decisions 
to change behavior. Given the fact that stu- 
dent performance feedback is the only type 
available to teachers on a regular basis, 
further research on ways to optimize the 
effect of such feedback would seem to be 
imperative. 
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Longitudinal and cross-sectional comparisons were made of economi- 
cally disadvantaged children in one Follow Through and several non- 
Follow-Through primary school programs. A longitudinal comparison 
was also made of the economically disadvantaged and nondisadvan- 
taged children in the Follow Through program. Clear-cut program 
effects were found in the fourth year (Grade 3) when Follow Through 
children were superior to non-Follow-Through children on IQ, achieve- 
ment, and social-motivational measures. The Follow Through program 
did not result in the economically disadvantaged children attaining 
the level of intellectual achievement displayed by the nondisadvan- 
taged children, The findings were discussed in relation to current 
issues in the field of compensatory education. 


Optimism concerning the effectiveness of 
preschool compensatory programs such as 
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Project Head Start has waned considerat 
in the last few years. Findings have be 
generally consistent that at the end 
year of Head-Start-type experience, chi 
are superior to comparison children witht 
this preschool experience in both intellect 
and social-emotignal functioning (cf. 
1969; Klaus & Cray, 1968; Weikart, 
Zigler & Butterfield, 1968). However, 
some exceptions (Ryan, 1974), the 
dence also indicates that the superiori 
Head Start children vanishes or is g 
diminished by the end of one year of fo 
public school (Bronfenbrenner, 1974; 
inghouse Learning Corporation, 1969). — 

In the face of this evidence, some 
concluded that compensatory educati 
general, or Head Start in particular, 
failure (Eysenck, 1971; Jensen, 
Others (Kohlberg, 1968; Zigler, 1973) 
responded to these findings with the 
that the expectation of long-term 
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from a one-year preschool program was an 
unrealistie one, since it was based on the 
questionable assumption that children could 
somehow be inoculated against the social, 
economie, and educational disadvantages 
whieh they would encounter later in life. 
Within this view, compensatory education 
would be seen as having a positive impact on 
the lives of economically disadvantaged 
children provided the effort were 'of long 
enough duration. At the level of national 
social policy, this viewpoint resulted in the 
creation of the Follow Through project. 

The Follow Through project sponsors 
four-year compensatory education programs 
for both Head Start graduates and non- 
Head-Start children. The project is an ex- 
| perimental effort permitting a variety of 

pedagogical approaches. (See Maccoby & 

Zeller, 1970, for a description of the 

y models employed.) Although Follow 

‘Vhrough programs have been in operation 

for-several years, available national evalua- 

tion data are limited to longitudinal findings 

for children who have been in the program 

for only one year (Stanford Research Insti- 

tute, 1971). This initial evaluation revealed 

enough differences in favor of Follow 
| Through children to lead some to conclude 
that the project is a success, although others 
regard the results as disappointing in view 
of the relatively high cost of the project and 
the small absolute size of the differences be- 
tween Follow Through and non-Follow- 
Through children. Officials responsible for 
the Follow Through project have taken the 
position that “a definitive interpretation of 
the first-year findings must await the results 
of ongoing evaluation efforts.” * 

The major purpose of the present paper is 
to report the findings of one such evaluation 
effort. A Follow Through program based on 
a model which has been employed in a nuni- 
ber of Follow Through centers was studied 
for four years. The effectiveness of the pro- 
gram was assessed through longitudinal 
data, cross-sectional data, and a special 


* Wilson, R. C. Communication from the U, S. 
Department of Health, Education, and Welfare, 
Office of Education, Washington, D. C., February 
13, 1969. 
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longitudinal comparison of economically 
disadvantaged and nondisadvantaged Fol- 
low Through children. 

The longitudinal portion of the evaluation 
involved 35 economically disadvantaged 
children who attended the full four years 
(kindergarten through third grade) of the 
Follow Through program and 26 comparison 
children who were attending three non-Fol- 
low-Through schools during this same pe- 
riod. In view of the small numbers remain- 
ing in the longitudinal study at the end of 
the fourth year, the decision was made to 
also conduct a cross-sectional study com- 
paring the performance of the entire group 
of 42 economically disadvantaged children 
graduating from the Follow Through pro- 
gram (7 children transferred to the program 
from other schools during the first and sec- 
ond years) with the entire group of 100 
economically disadvantaged children of the 
same age and grade in four non-Follow- 
Through schools (74 children plus the 26 
longitudinal comparison children). 

The 35 longitudinal Follow Through chil- 
dren were also compared with 10 nondis- 
advantaged children who had attended the 
full four-year Follow Through program. 
This. final comparison was conducted in 
order to illuminate two issues: (a) the 
progress of nondisadvantaged children at- 
tending school classes composed primarily 
of economically disadvantaged children and 
(b) the comparative performance of eco- 
nomically disadvantaged and nondisadvan- 
taged children under this special program. 


METHOD 


Subjects 


Longitudinal. At the outset of the study, the 
subjects included 61 children attending Follow 
Through (FT) kindergartens and 48 children 
attending kindergartens in three non-Follow- 
Through (NFT) schools. All of the children 
resided in New Haven, Connecticut, and all were 
from economically disadvantaged families, de- 
fined as follows: (a) lived in low-income ħousing, 
(b) parents had no more than a high school educa- 
tion, and (c) parents were employed as semiskilled 
or unskilled workers or were unemployed. The FT 
group was recruited for the program by soliciting 
parents in several low-income areas. The NFT 
group was made up of all of the economically 
disadvantaged children in one classroom from each 
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TABLE 1 
CHARACTERISTICS or LONGITUDINAL GROUPS 
M CAat Sex Head Start Race 
School program a school en- 
3 tramonths) "| Boys | Girls | Yes | No | Black | White | Yes 
t 15 
Follow Through 35 5-1 14 32 3 30 5 
Non-Follow-Through 26 5-2 14 7 19 19 7 5 


Note. Abbreviation: CA = chronological age. 


of three schools located in similar low-income 
areas. 

Upon completion of the longitudinal study four 
years later, 35 of the original 61 FT children and 
26 of the original 48 NFT children were com- 
pleting third grade in these schools. Three of the 
original FT children and three of the original 
NFT children were held back in kindergarten or 
first grade and were dropped from the longitudinal 
samples. The others who were dropped withdrew 
from these schools during the course of the study. 
An examination of the children’s demographic 
characteristics and their test scores at kinder- 
garten entrance revealed no systematic differences 
between either (a) those who stayed in or dropped 
out of the Follow Through program or (b) those 
who stayed in or dropped out of the non-Follow- 
Through programs.* 

The characteristics of the economically dis- 
advantaged longitudinal groups are presented in 
Table 1. The groups differed in four respects: As 
compared to the non-Follow-Through group, the 
Follow Through group had (a) a higher proportion 
of boys than girls, (b) a higher proportion of 
blacks than whites, (c) a higher incidence of Hend 
Start attendance, and (d) a higher incidence of 
father absence. In addition to these children, 10 
nondisadvantaged children (5 girls, 5 boys; 4 
black, 6 white) completed four years of the Fol- 
low Through program. These children resided in 
two-parent homes located in middle-income neigh- 
borhoods and the occupations of their fathers 
ranged from lower-middle to upper-middle class 
(e.g. fireman, salesman, professor). 

A Cross-sectional. The cross-sectional sample con- 
sisted of all third-grade children in the Follow 
Through program and in three non-Follow- 
Through programs who were economically dis- 
advantaged (as defined by the criteria given 
above), who were completing their fourth year of 
school, and for whom English was the primary 
language. The FT cross-sectional group consisted 
of 42 children: the 35 longitudinal children plus 
7 children who had transferred to the FT program 
during kindergarten or first grade. The NFT group 
of 100 children consisted of the longitudinal NFT 
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“Comparisons involving the larger samples 
tested in kindergarten and first grade have been 
reported elsewhere (Abelson, 1974). 


group plus an additional 74 children. This group 
was made up of the following school samples: = 

1. Inner-city tutorial (n = 48) : nine longitudina 
children and their 35 classmates in a New Haye 
inner-city school with a tutorial program, ph 
longitudinal children in another New Haven inni 
city school with the same program ; 

2. Outer-urban enriched (» 30): thi 
longitudinal children and their 17 economi 
disadvantaged classmates in a New Haven 8 
attended by both economically disadvantaged ar 
and nondisadvantaged children ; and 

3. Inner-city traditional (n = 22): twent) 
children in an inner-city school 35 miles from 
Haven. 

The characteristics of the FT and NFT € 
sectional groups are presented in Table 2. As 
true of the longitudinal groups, the cross-sectior 
groups differed in proportion of boys and 
incidence of Head Start attendance, and incid 
of father absence. Unlike the longitudinal 
the cross-sectional groups had similar proporti 
of blacks and whites. 4 


School Programs 


Follow Through. The Follow Through pro 
under study was conducted by the publie 8 
of New Haven and Hamden, Connecticut. 
program was based from its inception OD 
educational model which is usually identified 
Bank Street College of Education (see Maceo 
Zellner, 1970), and staff from Bank Street 
involved in the training of Follow Through te 
ers during the last two years of the study. 
classes were composed of no more than 
dren and at least two full-time staff (head teag 
and assistant teacher). Although the majority | 
children were economically disadvantaged, 
class included some children from middle- 
upper-middle-income families. . 2m 

The features which most markedly disti 
the FT program írom programs in the 
schools in the study were (a) an individual ra 
than a group-oriented approach, (b) an 
interest in the social-emotional development 9 
child, or what is often called the “whole € 
approach, and (c) an emphasis on learning ho 
learn through the mastery of underlying P! 
and concepts. These features led to more 
contacts between teachers and individual cl 
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TABLE 2 
CHARACTERISTICS OF CROSS-SECTIONAL GROUPS TESTED AT END or THIRD GRADE 


MCA Sex Head Start Race Father absent 
School program LI (years, 
months) | Boys | Girls | Yes | No | Black | White| Yes | No 
Follow Through 42 8-9 25 17 37 5 37 5 17 25 
Non-Follow-Through 100 8-10 51 49 26 74 89 11 27 73 


Note. Abbreviation: CA = chronological age. 


in FT classrooms than in NFT classrooms. They 
also resulted in the use of a broad array of teach- 
ing methods, for the manner in which a lesson was 
conducted in an FT classroom depended on the 
needs, interests, and ongoing responses of the 
children in that classroom, rather than on a pre- 
set curriculum or technique. The FT program 
emphasized verbal communication skills. In addi- 
tion, adult-child relations were oriented specifically 
toward fostering children’s self-esteem and inter- 
personal trust. 

Non-Follow-Through. In the longitudinal study, 

three NFT schools in New Haven were employed; 
these same schools plus a school in another 
Connecticut city were employed in the cross- 
sectional study. The four schools served neighbor- 
hoods which were comparable to those of the 
economically disadvantaged FT children. Three 
of the schools (including the one outside New 
Haven) were in inner-city neighborhoods with 
predominantly black and Spanish-speaking popula- 
tions. The other school was in a non-inner-city 
New Haven area with a 60% black, 40% white 
population. Children attending this school either 
came from low-income families residing in a public 
housing project or from middle-income families in 
the surrounding neighborhood. 
_ At the beginning of the study, the three schools 
in New Haven had quite similar, traditional publie 
School programs. Classes were larger than in the 
FT program, and classroom teachers seldom had 
more than occasional outside assistance, Lessons 
were usually organized for each class group as a 
whole, with the exception of reading which was 
traditionally taught with small groups. 

During the second year of the longitudinal 
Study, class sizes were reduced and a number of 
experimental projeets were initiated in the New 
Haven schools. In the two inner-city schools at- 
tended by NFT children, low-achieving children 
began to be tutored individually (several NFT 
children in the present study were tutored). In 
one of these schools, some children were placed in 
a busing program to schools in middle-class neigh- 
borhoods (five of the original NFT children were 
selected for this program and were therefore 
dropped from the study). The non-inner-city 
school in the socioeconomically mixed neighbor- 
fees was not involved in these projects, but the 
‘acilities of this school were greatly expanded at 


this time and the school staff initiated an extensive 
program of extracurricular activities. 

The programs of these three NFT schools thus 
changed in some significant respects during the 
course of the longitudinal study, Although the 
pedagogical approaches used in the classrooms 
continued to reflect the group-oriented, didactic 
model which has traditionally been followed in 
public schools, the range of educational oppor- 
tunities that were available to the children 
broadened considerably. It was primarily because 
of these changes that the fourth NFT school, lo- 
cated in another city, was sought out and added 
to the cross-sectional study. As the children in 
this fourth NFT school had not received any 
special remedial or enrichment programs, they 
provided a sample of comparison children with a 
more traditional type of inner-city school experi- 
ence than the children in the other NFT schools. 


Measures 


Academic achievement. Academic achievement 
at the beginning and end of kindergarten was mea- 
sured with the Screening Test of Academic Readi- 
ness (STAR; Ahr, 1966). The STAR is a group- 
administered instrument designed to appraise 
general information, conceptual maturity, and 
perceptual-motor development in preschool- and 
kindergarten-aged children, Academic achievement 
in Grade 1 was assessed with the Metropolitan 
Achievement Tests, Primary I Battery, Form A 
(Bixler, Durost, Hildreth, Lund, & Wrightstone, 
1958-1962). This group-administered battery in- 
cludes four tests: Word Knowledge, Word Dis- 
crimination, Reading Comprehension, and Arith- 
metic. 

Since NFT children took school-administered 
Metropolitan Achievement Tests twice a year 
during the last two years of the study, they be- 
came considerably more familiar with these in- 
struments than did the FT children. For this 
reason, a different instrument was employed for 
all children in Grade 3 to measure academic 
achievement—the Peabody Individual Achieve- 
ment Tests (PIAT; Dunn & Markwardt, 1970). 
The PIAT is an individually administered battery 
of Mathematics, Reading Recognition, Reading 
Comprehension, Spelling, and General Informa- 
tion tests. 

Intellectual abilities. Form B of the Peabody 
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Picture Vocabulary Test (PPVT; Dunn, 1965) was 
used to investigate verbal intellectual develop- 
ment throughout the four years of the study. The 
PPVT is an individually administered test which 
assesses verbal conceptual knowledge independent 
of reading ability. In addition, the individually 
administered Picture Arrangement Test of the 
Wechsler Intelligence Scale for Children (Wech- 
sler, 1949) was administered in Grade 3 to assess 
the children’s nonverbal, problem-solving abilities. 

Problem-solving style. Several measures were 
administered in first grade to assess the children’s 
effectiveness in handling unfamiliar tasks. One of 
these measures, the Sticker Game (Zigler & Tur- 
nure, 1964), was designed to investigate the extent 
to which children rely on imitation rather than on 
their own ideas in carrying out a task. The 
Sticker Game yields imitation scores based on the 
similarity of designs the child makes to three de- 
signs constructed by the examiner. The frequency 
with which the children spontaneously engaged in 
verbal communication with the examiner was also 
recorded during the Sticker Game. 

Also administered in first grade were the 
Circles Test from the Torrance Tests of Creative 
Thinking (Torrance, 1966), which was used to mea- 
sure creativity on a nonverbal task, and a modi- 
fied form of Torrance’s Just Suppose Test, which 
was used to measure creativity on a verbal task. 
These tests assess creativity by the originality, 
fluency, and flexibility of ideas expressed by the 
child. The Sticker Game and Circles Test were 
individually administered to each child in one ses- 
sion. The Just Suppose Test was individually 
administered in a second session. 

The Picture Arrangement Test employed in 
Grade 3 to measure nonverbal, problem-solving 
ability also provided an index of the degree to 
which children were impulsive as opposed to re- 
flective when tackling new problems. Impulsivity 
was gauged by the amount of time children spent 
studying the pictures before attempting to arrange 
them in the correct order. Since the Picture Ar- 
rangement Test problems differ in difficulty and in 
time allowed to complete them, a relative time 
measure was used, a latency score consisting of the 
ratio of the preliminary study time (time from the 
"start" signal to the first reordering of the pietures) 
to the overall problem-solving time. A latency 
score of 15% thus indicated that a child spent 15% 
of his problem-solving time in preliminary study of 
the pictures. Lower latency scores imply a more 
impulsive approach; higher latency scores imply a 

more reflective approach. 

Attitude toward school. Children's attitudes to- 
ward school were assessed in Grade 3 with the 
Attitude Toward School Questionnaire (ASQ; 
Klein & Strickland, 1970). The children were pre- 
sented 54 situations involving classmates, teachers, 
lessons, and other aspects of school life. By circling 
one of three faces in their test booklets, the chil- 
dren indicated whether they would feel happy, 
neutral or don't know, or unhappy in each situa- 
tion. The ASQ was group-administered in two 
sessions to each class. Responses to the 53 scored 
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items (the first item was a practice item) wei 
quantified on a scale of 1 equals unfavorabl 
attitude, 2 equals neutral, and 3 equals favorabl 
attitude. Attitude scores could thus range from 53 
(totally unfavorable) to 159 (totally favorable). 

The attitude data were factor analyzed to 
determine whether there were meaningful ii 
clusters. Two main factors were revealed: an eight- 
item factor in which the items all concerned scho 
work and a seven-item factor in which all ii 
involved situations where children were in com: 
tact with a teacher or principal. The subscores oi 
these two factors (attitude toward academic wor 
and attitude toward authorities) and the overal 
attitude scores were analyzed. 
Self-image. A 32-item form of Coppersmith! 
Self-Esteem Inventory (Coopersmith, 1967) 
administered in third grade, but the results ar 
not reported because a considerable number 0 
NFT children were unable to respond in a 
criminating way to the Coopersmith items, 
it became evident that Coopersmith results could 
not be used, a semiprojective, exploratory mei 
sure of children’s perceptions of themselves 
students in school was specially developed for 
study. The instructions for this Student 
Image measure (SSI), which was admii 
individually, were as follows: 


Now we're going to draw a picture and you hay 
to pretend you are somewhere else. Here i8 
sheet of paper and a pencil. Pretend that you; 
in class. It is morning and the class is worl 
The picture could be called "[child's m 
class at work.” I'll write the title up here. Okay, 
you draw a picture here of your class at work..: 
Now pretend the teacher said, “Today we are 
going to start on some new work. It is som 
thing we have never had before. Now lii 
and try to get it the first time." Would you 
probably get it the first time or not uni 

later?... Pretend you're having trouble Wi 
the new work. Pretend you don't understand 
What would you do?... What would you do 
you still didn't get it?... Do you think yous 
ever get it?.. . Are there some kinds of work 
you do in school that you feel pretty 
at? ... What do you like best about school? 


" 


Three dimensions of the child's image of him 
Self as a student were assessed on the SSI. Th 
child's sense of membership in his class group 
inferred from the manner in which he pi 
himself in his drawing of his class. Sense of mi 
bership was rated on a scale of 0-4 (0 — the 
left himself out of the picture entirely; 1 = 
child drew himself alone; 2 = the child drew 
self with one other person; 3 = the child drew 
self as part of a group; and 4 = the child í 
himself as part of a group and the selí-figure 
given prominence by reason of size, detail, 
central position). 
The child's feelings of effectiveness in 
Were assessed from the confidence and initi 
Which he expressed in his responses to the 
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tions concerning a new lesson. Confidence was 
rated as follows: 0, if the child did not think he 
would learn a new lesson the first time it was 
presented and thought he probably would never 
earn it; 1, if the child was uncertain about whether 
or not he would learn a new lesson; and 2, if the 
child was confident he would learn a new lesson 
and would probably learn it the first time it was 
presented. Initiative was rated as follows: 0, if the 
child could suggest no plan for coping with a 
learning difficulty in class; 1, if the child suggested 
asking someone else for help; and 2, if the child 
suggested a plan which entailed adaptive, self- 
initiated activity. 

The child’s involvement in academic learning 
»ssed from what he said he liked best about. 


Was as! 

school, Learning involvement was rated as follows: 
0, if the child said he liked nothing about school 
except lunch or recess; 1, if the child expressed a 


vague, general liking for school but not for any 
specific activity; 3, if the child said he most liked 
a nonacademic activity such as music or gym; and 


rating was used in order to give this dimen- 
sion the same range as the others.) 

The ratings were summed to obtain an overall 
SSI score. These scores could range from 0 (nega- 
tive self-image) to 12 (positive self-image). In 
order to insure impartial ratings, all identifying 
information was blocked out of the drawings and 
answer sheets, Each protocol was then scored in- 
dependently by two raters, neither of whom had 
previously been associated with the study. The 
product-moment correlation coefficient for the two 
raters’ SSI scores was +.87, indicating a satisfac- 
torily high degree of interrater agreement. (Inter- 
rater agreement for the four individual ratings 
ranged from +.84 to +.95.) 

Test behavior. A behavior checklist was em- 
ployed to record each child’s behavior during the 
two sessions in which the ASQ was administered in 
the third grade. Task-related verbal behavior of 
three types was recorded: (a) requests for assist- 
ance (eg. “Can you erase this for me?”); (b) 
comments about procedures or materials (eg. 

How many pages are left?"); (c) comments or 
questions about the meaning of the test items (e.g., 
in response to the item, ^You came to school in 
the morning. There is a sign near the door. It says 
No School Today.” “What day is school closed? 
How I feel depends on what day school is 
closed."). Each type of verbal behavior was re- 
corded as either present or absent during the 
presentation of each of the 54 ASQ items. Mal- 
adaptive behaviors (disruptions, withdrawal, re- 
fusal to do the task) were also recorded but are 
Dot reported because they occurred very infre- 
quently. 


Procedure 


The testing schedule over the four years of the 
longitudinal study was as follows: 
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Kindergarten: 
September—PPVT (1 individual session) 
October—STAR (1 group session) 
April—STAR (1 group session) 
May—PPVT (1 individual session) 
Grade 1: 
March—Problem-solving style (2 individual ses- 
sions) 
May—PPVT (1 individual session) 
June—Metropolitan Achievement Tests (2 group 
sessions) 
Grade 3: 
March—ASQ, behavior checklist (2 group ses- 
sions) 
April —PPVT, SSI (1 individual session) 
May—Picture Arrangement Test, PIAT (1 in- 
dividual session) 


Testing of cross-sectional subjects was confined to 
Grade 3 and followed the Grade 3 schedule shown 
above. 

Intelligence and academic achievement tests 
were administered in accordance with the stan- 
dardized instructions given in the test manuals. 
The testing was carried out by 12 female examiners 
(4 for the kindergarten testing, 3 for the Grade 1 
testing, and 5 for the Grade 3 testing). Group test- 
ing took place in the classrooms with 1 of the 
examiners presenting the test items and 2 or 3 
others serving as proctors. (Teachers assisted the 
proctors, except during the Grade 3 administra- 
tion of the ASQ when school staff were excluded 
from the classrooms.) Children were taken out of 
their classrooms for individual testing. The in- 
dividual testing followed a schedule which insured 
that (a) every examiner tested both FT and NFT 
children on each measure and (b) no child was 
individually tested more than once by the same 
examiner. 


RESULTS 


Analyses were performed on IQ, mental 
age (MA), and raw scores for the PPVT 
measure, but since in all cases the findings 
were the same for all three scores, only IQ 
scores are reported. For all other measures, 
analyses were performed on raw scores. 
Since the FT and NFT groups differed in 
their proportion of boys and girls, all anal- 
yses incorporated sex as a separate dimen- 
sion. 


Longitudinal Study 


In order to assess the comparability of the 
FT and NFT groups at the outset of the 
longitudinal study, 2 X 2, Program (ET 
x NFT) x Sex unweighted means analyses 
of variance were conducted on the PPVT 
and STAR scores obtained at the beginning 
of kindergarten. No significant effects were 
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TABLE 3 


PPVT PERFORMANCE or HEAD START SAMPLES 


IN FT anp NFT LONGITUDINAL GROUPS DURING 


KINDERGARTEN AND FIRST GRADE 
FT - Head Start NFT - Head Start NFT - non-Head-Start 
Time of testing Boys Girls Boys Girls Boys Girls 
(n = 20) (n = 12) (n = 4) (n = 3) (nm = 8) (n = 11) 
Beginning of kindergarten 92.60 71.58 94.50 87.33 64.38 67.09 
End of kindergarten 102.55 89.58 99.50 88.67 96.25 86.36 
End of first grade 104.85 94.82 104.00 88.33 90.38 86.70 


Note. Abbreviations: PPVT = 
NFT = non-Follow-Through. 


found in the analysis of the PPVT scores. 
While no program effect was found for the 
STAR scores, a sex effect (F = 8.47, df = 
1/55, p < .01) and a significant Program x 
Sex interaction (F = 4.23, df = 1/55, p < 
.05) were discovered. Comparisons among 
the four groups by the Newman-Keuls 
method (Winer, 1971) indicated that the sex 
effect was largely due to the high scores of 
the NFT girls. 

As can be seen in Table 1, the FT and 
NFT groups differed in incidence of father 
absence and of Head Start experience. The 
father absence variable was ignored after 
preliminary analyses revealed that it was 
not significantly related to any dependent 
measure at any testing. 

Because of the extremely small number of 
non-Head-Start children in the FT program, 
the effects of Head Start experience were ex- 
amined in 3 X 2, Education (FT — Head 
Start x NFT - Head Start x NFT - Non- 
Head-Start) x Sex unweighted means anal- 
yses of variance on each dependent measure 
at each testing. Head Start effects were 
found only at the beginning of kindergarten 
and the end of first grade (see Table 3). Head 
Start experience had a significant effect on 
PPVT performance at the beginning of 
kindergarten (main effect for education, F = 
4.61, df — 2/52, p « .05) when both FT and 
NFT Head Start groups scored higher than 
the NFT - non-Head-Start group. At the 


5 The groups also differed in proportion of blacks 
and whites, but there were too few white children 
in the sample to permit meaningful analyses for 
race effects. 


Peabody Picture Vocabulary Test, FT = Follow Through, and 


end of kindergarten, there were no differ- 
ences in the PPVT performance of these 
three groups. The PPVT scores obtained at 
the end of first grade showed a combined 
Head Start — Follow Through effect. A main 
effect for education (F = 3.47, df = 2/52, 
p < .05) was found; however, comparisons 
among the groups revealed that only two of 
the three groups differed from each other: 
FT- Head Start children scored signifi- 
cantly higher than NFT - non-Head-Start 
children. Group comparisons indicated that 
FT- Head Start children also scored sig- 
nificantly higher than NFT- non-Head- 
Start children on the two verbal measures of 
problem-solving style administered in first 
grade (frequency of verbalization and crea- 
tive thinking, verbal), although main effects 
for education were not found in the analyses 
of variance of these measures. No Head 
Start effects were found on any other 
kindergarten or Grade 1 measure nor were 
they found on any of the Grade 3 measures. 

TThe scores of the FT and NFT children at 
every testing for school program effects are 
presented in Table 4. A 2 x 2 (Program X 
Sex) unweighted means analysis of variance 
was performed on each measure at each test- 
ing? (One-tailed tests of statistical signifi- 


° Covariance adjustment was possible for the 
PPVT measure since it was administered at every 
testing. End-of-kindergarten raw scores were used 
as the covariable for three-group comparisons 
(FT—Head Start x NFT- Head Start X NFI- 
Non-Head-Start) and two-group comparisons 
(FT X NFT) of Grade 1 raw scores and Grade 3 
raw scores. (Beginning kindergarten scores coul 
not be used because the PPVT performance © 
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TABLE 4 
Scores For FT anp NFT LONGITUDINAL GROUPS 
Follow Through Non-Follow-Through 
Measure Boys (n = 21) Girls (n = 14) Boys (n = 12) Girls (» = 14) 
M SD M SD M SD M SD 
End of kindergarten 
ey ira ae 88.36 | 18.67 | 97.33 | 15.37 | 86.86 | 17.31 
A 0.70 | 51.79 | 11.99 | 47.1 4 3 
End of Grade 1 scade] 
PPVT 103.52 | 10.10 | 94.23 | 11.20 | 94.09 | 10.41 87.08 | 12.90 
MAT, Word Knowledge 19.39 | 7.54 | 20.69 | 9.29] 19.10 | 7.11 16.50 | 9.69 
MAT, Word Discrimination 19.50 | 9.38 | 21.31| 9.35 | 19.30 | 9.50 18.42 | 7.44 
MAT, Reading Comprehension 15.06 | 6.95| 20.17 | 10.11 | 13.67 | 5.77 | 14.92) 8.12 
MAT, Arithmetic 39.35 | 12.04 | 39.92 | 15.38 | 42.20 | 11.72 40.08 | 10.08 
Imitation 9.70] 5.25 9.38 | 4.56 | 10.73 | 5.44| 12.50 5.23 
Frequency of communication 7.85 | 9.44 9.54 | 14.16 2.82| 4.19 1.64 | 2.34 
Creative thinking, nonverbal 35.15 | 21.78 | 28.08 | 17.58 | 32.46 | 19.18 32.14 | 18.79 
Creative thinking, verbal 9.10 | 5.97 7.62 | 6.28 6.18 | 6.51 4.36 | 4.77 
End of Grade 3 
PPVT 100.95 | 11.51 | 89.57 | 10.46 | 92.92 9.10 | 86.07 | 12.55 
Picture Arrangement Test 23.67 | 6.95 | 20.64 | 6.26 | 19.46 | 8.10 17.29 | 8.92 
PIAT, Mathematics 37.05 | 8.29] 33.57 | 8.51] 32.91 10.42 | 29.93 | 7.71 
PIAT, Reading Recognition 32.14 | 8.52 | 30.64| 8.48 | 34.09 8.83 | 30.64 | 6.17 
PIAT, Reading Comprehension 30.48 | 7.62 | 31.00 | 9.65] 31.54 7.54 | 30.64| 4.83 
PIAT, Spelling 33.48 | 9.30 | 34.93 | 13.22 | 36.46 9.21 | 33.14 | 7.56 
PIAT, General Information 26.81 | 8.69| 19.00| 8.37 | 18.82 7.51 | 19.36 | 6.61 
Problem-solving latency 31.05 | 7.45] 27.00| 5.39 | 32.64 | 6.07 25.36 | 8.68 
ASQ, academic work 15.94 | 4.28 | 17.15 | 4.24 | 16.58 5.00 | 18.00 | 3.02 
ASQ, school authorities 12.59 | 3.16 | 13.08 | 2.96 12.75 | 2.56 | 13.08 | 2.58 
ASQ, overall attitude 114.71 | 8.64 | 119.08 | 13.09 | 115.25 16.09 | 123.17 | 6.47 
Student Self-Image 8.00 | 3.12 8.00 | 2.63 6.50 | 3.00 8.75 | 2.80 
Comments on procedures 11.33 | 10.02 | 7.62] 7.18 | 7.25 | 6.94 3.57 | 4.38 
Requests for help ‘sel 142| 192| 2.02| 1.00| 148| 64| 1.65 
Comments on meaning 210| 3108| 1.92] 2.63) 5| | 29^ i 


Note. Abbreviations: FT 
ture Vocabulary Test, STAR = Screening 
ment Test, PIAT = Peabody Individual 
tionnaire. 


cance were employed to assess program 
main effects in view of the expectation that 
the FT children would have higher scores. 
Two-tailed tests were employed to assess 
sex main effects and interaction effects.) The 
results of these analyses are summarized in 
Table 5. The three program effects found 
at the end of Grade 1 were combination 
Head Start- Follow Through effects, as 


reported above. On every Grade 3 measure 
E a S o DT 


many children appeared to be attenuated by moti- 
vational factors at that testing.) The findings were 
comparable to those reported for the unadjusted 
meara of variance except that the significance 
evels for the Grade 1 differences were improved. 


= Follow Through, NFT 
Test of Academic Readiness, MAT = 
Achievement Test, and ASQ = Attitude Toward School Ques- 


— non-Follow-Through, PPVT — Peabody Pic- 
Metropolitan Achieve- 


where a program effect was found, FT chil- 
dren scored higher than both NFT - Head 
Start and NFT — non-Head-Start children. 


Cross-Sectional Study 


Preliminary analyses of the Grade 3 cross- 
sectional data revealed that father absence 
and Head Start attendance did not sig- 
nificantly influence performance on any of 
the dependent measures. The scores of the 
FT and NFT children are presented in Table 
6. A 2 x 2 (Program X Sex) unweighted 
means analysis of variance was conducted 
on each of the measures. The results of these 
analyses are summarized in Table 7. 
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TABLE 5 
Summary or Frnpines ror FT anp NFT LONGITUDINAL GROUPS 
Measure Program Sex Program X Sex 
End of kindergarten , 
PPVT ns boys > girls* ns 
STAR ns ns ns 
End of Grade 1 
PPVT FT > NFT** boys > girls** ns 
MAT, Word Knowledge ns ns ns 
MAT, Word Discrimination ns ns ns 
MAT, Reading Comprehension ns ns ns 
MAT, Arithmetic ns ns ns 
Imitation ns ns ns 
Frequency of communication FT > NFT** ns ns 
Creative thinking, nonverbal ns ns ns 
Creative thinking, verbal FT > NFT* ns ns 
End of Grade 3 
PPVT FT > NFT* boys > girls** ns 
Picture Arrangement Test FT > NFT* ns ns 
PIAT, Mathematics FT > NFT* ns ns 
PIAT, Reading Recognition ns ns ns 
PIAT, Reading Comprehension ns ns ns 
PIAT, Spelling ns ns ns 
PIAT, General Information FT > NFT* boys > girls* ns 
Problem-solving latency ns boys > girls** ns 
ASQ, academic work n8 ns ns 
ASQ, school authorities ns ns ns 
ASQ, overall attitude ns ns ns 
Student Self-Image ns ns ns 
Comments on procedures FT > NFT* ns ns 
Requests for help ns ns ns 
Comments on meaning FT > NFT** ns ns 


Note, Abbreviations: FT = Follow Through, NFT = non-Follow-Through, PPVT = Peabody Pic- | 
ture Vocabulary Test, STAR = Screening Test of Academic Readiness, MAT = Metropolitan Achieve- | 
ment Test, PIAT = Peabody Individual Achievement Test, and ASQ = Attitude Toward School Ques- 


tionnaire. 
* p< .05, 
** p « 0l. 


Since the NFT children attended three 
types of school program (mixed socio- 
economie grouping with enriched program, 
tutorial program, traditional program), a 
3 x 2, School (Mixed x Tutorial x Tradi- 
tional) x Sex unweighted means analysis of 
variance was performed on the NFT chil- 
dren's scores on each of the dependent mea- 
sures. The results are summarized in Table 
8. On 2 of the 11 measures on which a signifi- 
cant FT effect was found (Mathematics and 
problem-solving latency), a significant dif- 
ference among the three NFT schools was 
also found. On these 2 measures, the anal- 
ysis was rerun with the FT children included 
as a fourth school group. On the Mathe- 
matics test, a school main effect (F = 6.03, 


df = 3/132, p < .001) was again found. Em- 
ploying the Newman-Keuls procedure, com- 
parisons of the four schools indicated that 
FT children scored significantly higher than 
NFT children in the tutorial and traditional 
schools, but FT children did not score dif- 
ferently from NFT children in the socio- 
economically mixed school. On the problem- 
solving latency measure, a school main effec 
(F = 7.50, df = 3/132, p < .001) was agam 
found. Comparisons of the four schools m- 
dicated that FT children had significantly 
higher latency scores than NFT children 1m 
only one of the NFT schools, the traditionà 
school. 

In the major cross-sectional analyse 
then, FT children were compared with 8 - 
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TABLE 6 
Scores ron FT anp NFT Cnoss-SEcrioNAL Groups AT END or THIRD GRADE 
Follow Through Non-Follow-Through 
Measure Boys (n = 25) Girls (s = 17) Boys (n = 51) Girls (n = 49) 
M SD M SD M SD M SD 

PPVT 101.84 | 11.10 | 90.24 | 9.69 | 91.26 | 12.33 | 86.41 | 11.04 
Pieture Arrangement Test 23.92 | 6.63 | 20.88| 6.18| 19.41| 7.72| 16.61] 7.65 
PIAT, Mathematics 36.48 | 8.36 | 35.29 | 8.62| 33.27 | 9.56| 28.29 | 7.98 
PIAT, Reading Recognition 32.64 | 9.00 | 31.65 | 8.25 | 30.78 | 8.41 | 30.61| 6.36 
PIAT, Reading Comprehension 30.80 | 7.91 | 31.88 | 9.17 | 29.18 | 7.05| 29.14 | 5.41 
PIAT, Spelling 33.80 | 9.18 | 36.18 | 12.78 | 32.12 | 8.36 | 32.22 | 6.95 
PIAT, General Information 27.84 | 9.94 | 20.88 |. 9.95 | 20.16 | 7.76 | 17.74| 5.77 
Problem-solving latency 31.20 | 7.82 | 27.94 | 5.36 | 26.76 | 9.68| 23.35) 8.99 
ASQ, academic work 15.96 | 4.10 | 17.63 | 3.95| 17.11| 4.21 | 18.290 | 3.41 
ASQ, school authorities 12.92 | 3.59 | 13.38 | 3.05| 12.69] 3.62| 12.67 | 3.66 
ASQ, overall attitude 114.56 | 10.36 | 120.25 | 12.44 | 116.54 | 15.19 | 122.23 | 10.91 
Student Self-Image 8.12 | 2.85 8.06 | 2.44 6.77 | 2.35 7.41 | 2.95 
Comments on procedures 12.36 | 9.81 7.25 | 7.25 7.17 | 6.75 3.92 | 4.18 
Requests for help .80| 1.35 2.50 | 3.37 1:15 | 1.78 .75 | 1.50 
Comments on meaning 3.16 | 4.49 2.38 | 2.90 .22 .59 .46 | 1.34 


Note. Abbreviations: FT = Follow Through, NFT = non-Follow-Through, PPVT = Peabody Pic- 
ture Vocabulary Test, PIAT = Peabody Individual Achievement Test, and ASQ = Attitude Toward 


School Questionnaire. 


TABLE 7 
Summary or Finpines ron FT ann NFT Cnoss-SecrioNAL Groups at Exp or THIRD GRADE 
Measure Program Sex Program X Sex 

PPVT FT > NFT** | boys > girls*** ns 
Picture Arrangement 

Test FT > NFT** | boys > girls* ns 
PIAT, Mathematics FT > NFT** ns ns 
PIAT, Reading 

Recognition ns ns ns 
PIAT, Reading FT > NFT* ns ns 

Comprehension 
PIAT, Spelling FT > NFT* ns ns 
PIAT, General 

Information FT > NFT*** | boys > girls** ns 
Problem-solving 

latency FT > NFT** boys > girls* ns 
ASQ, academic work ns ns ns 
ASQ, school 

authorities ns ns e 
ASQ, overall attitude ns girls > boys* ns 
Student Self-Image FT > NFT* ns i 
Comments on 

procedures FT » NFT** boys > girls** ns FT boys 
Requests for help FT » NFT* ns FT girls > NFT boys > NẸT dirlse 
Comments on meaning, FT > NFT*** ns fh 


Note. Abbreviations: FT = Follow Through, NFT = non-Follow H 
ture Voeabulary Test, PIAT — Peabody Individual Achievement Test, and ASQ = 


School Questionnaire. 
*p < .05. 


-Through, PPVT = Peabody Pic- 
Attitude Toward 
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TABLE 8 
Summary or FiNpiNGs ror Taree NFT Scuoors (Mixen Sociorconomic Status, TUTORIAL, 


TRADITIONAL) IN CROSS-SECTIONAL 
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Group AT Enp or THIRD GRADE 


Measure School Sex School X Sex 

PPVT ns ns ns 1 
Picture Arrangement Test ns ns T boys & M girls > others* 
PIAT, Mathematics M &T > Tr* boys > girls** ns 

PIAT, Reading Recognition ns ns ns 

PIAT, Reading Comprehension ns ns ns 

PIAT, Spelling ns na na 

PIAT, General Information ns ns ns 
Problem-solving latency M &T > T” ns ns 

ASQ, academic work ns na ns a 
ASQ, school authorities Mi Dey Trt* ns M girls > others > Tr girls® 
ASQ, overall attitude ns girls > boys* ns 

Student Self-Image ns ns ns 

Comments on procedures ns boys > girls** na 

ns 


Requests for help 
Comments on meaning 


Note. Abbreviations: NFT = non-Follow-Through, M = mixed socioeconomic status, T = tutorial 
Tr = traditional, PPVT = Peabody Picture Vocabulary Test, PIAT = Peabody Individual Ach 
ment Test, and ASQ = Attitude Toward School Questionnaire. 


*p« 05. 
**9:« «017 


group of NFT children whose performance 
on these two measures was influenced by the 
program of the partieular school they were 
attending. 


Comparisons of FT Economically 
Disadvantaged and FT 
Nondisadvantaged Children 


The scores of the FT nondisadvantaged 
children at every testing are presented in 
Table 9. At the outset of the program (begin- 
ning kindergarten), these children scored 
higher than the FT economically disadvan- 
taged children on both the PPVT (F = 8.84, 
df = 1/44, p < .01) and STAR (F = 9.99, 
df = 1/42, p < .01). Analyses revealed no 
significant changes in the nondisadvantaged 
children’s PPVT IQs over the course of the 
four-year program. A 2 x 2, Income Group 
(Disadvantaged x Nondisadvantaged) x 
Sex unweighted means analysis of variance 
was conducted on every measure at every 
testing which assessed the effects of the FT 
program. The results of these analyses are 
summarized in Table 10. 


Discussion 
Effects of Head Start 


Given the very small number of FT chil- 
dren who had not had Head Start, essen- 


ns 


ns ns 


tially what was examined in this study wal 
the impact of a five-year compensator 
effort—one year of Head Start and fou 
years of Follow Through. Consistent with 
current views, the findings of the presem 
study indicate that a one-year Head Star 
experience, if not followed by a subsequen 
compensatory effort, has little lasting effec 
on children's performance. The Head SÍ 
effects that were found were limited to th! 
beginning of kindergarten and end of fir 
grade, and these pose certain problems 0 
interpretation. At the beginning of kini 
garten, the PPVT effect was due to the 
ordinately low scores obtained by nor 
Head-Start children. This effect disappea 

at the end of kindergarten when the nO 
Head-Start children showed a 25-poin! 
increase, In view of the evidence (Belle 
1969; Thomas, Hertzig, Dryman, & Fe 
dez, 1971; Zigler, Abelson, & Seitz, 
Zigler & Butterfield, 1968) that deletel 
motivational factors attenuate the inte 
gence test performance of disadvan 
children, it seems probable from the là 
IQ increase of the non-Head-Start chi 
that their low PPVT scores at school € 
trance were due to such factors rather th 
being accurate indicators of their act 
cognitive abilities. 
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TABLE 9 
Scores ron FT Nonpisapvantacep Boys AND GIRLS 
NEC Boys (n = 5) Girls (s = 5) 
ES M SD M SD 
End of kindergarten 
E IR 12.20 9.73 101.60 21.69 
End of Grade 1 1 iy jue v 
PPVT 126.00 28.28 1 
MAT, Word Knowledge 29.00 6.24 D p 
MAT, Word Discrimination 30.33 5.03 27.60 7.89 
MAT, Reading Comprehension 31.00 18.52 29.50 14.06 
MAT, Arithmetic 46.33 20.11 42.60 10.67 
Imitation 9.75 5.56 10.20 6.22 
Frequency of communication 22.50 24.09 8.60 7.80 
Creative thinking, nonverbal 52.75 15.31 20.00 13.78 
Creative thinking, verbal — —* 7.00 3.83 
End of Grade 3 i 
PPVT 116.60 9.92 97.80 11.01 
Picture Arrangement Test 31.00 7.87 23.00 5.83 
PIAT, Mathematics 49.60 14.98 34.60 11.38 
PIAT, Reading Recognition 40.80 17,24 36.20 7.36 
PIAT, Reading Comprehension 37.00 12.90 32.80 7.60 
PIAT, Spelling 41.40 14.66 42.80 13.54 
PIAT, General Information 43.60 13.65 28.60 10.94 
Problem-solving latency 25.40 5.46 25.80 8.29 
ASQ, academic work 16.40 4.04 17.25 3.40 
ASQ, school authorities 12.80 1.64 9.50 1.92 
ASQ, overall attitude 111.40 8.26 115.00 16.31 
Student Self-Image 8.20 2.59 8.80 2.95 
Comments on procedures 9.60 6.15 1.75 2.36 
Requests for help 20 45 .00 .00 
Comments on meaning 4.40 3.29 -00 .00 


Note. Abbreviations: FT — Follow Through, PPVT — Peabody Picture Vocabulary Test, STAR = 
Screening Test of Academic Readiness, MAT = Metropolitan Achievement Test, PIAT = Peabody 
Individual Achievement Test, and ASQ = Attitude Toward School Questionnaire. 

* Scores were obtained for only two nondisadvantaged boys on this measure. 


The reemergence of a significant PPVT 
performance difference at the end of first 
grade between FT — Head Start children and 
NFT - non-Head-Start children was partly 
due to the gains made by FT children and 
partly due to losses by NFT — non-Head- 
Start boys. These boys seem not to have 
profited as much from their schooling—at 
least as measured by PPVT performance— 
às the other children in NFT schools. The 
poorer performance of these boys again may 
have .been due to motivational factors, in 
that it was evident that NFT first-grade 
classrooms did not provide as warm and 
Supportive an environment for the children 
as did NFT kindergartens. 

The findings for the first-grade, verbal 
Problem-solving style measures show the 


same transitional pattern as the PPVT IQ 
results. The FT children were generally 
ahead of NFT children in verbal perform- 
ance abilities, but their superiority was 
clear-eut only in comparison to NFT chil- 
dren who had not had Head Start. Two 
years later this was no longer the case. On 
all Grade 3 measures where FT children 
were superior, they were superior to both 
the Head Start and non-Head-Start children 


in NFT schools. 


Effects of Follow Through 

The findings at the end of third grade 
showed a consistent pattern of superior per- 
formance by FT over NFT children. In the 
longitudinal study, the FT children were 
superior on such intelligence-achievement 
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TABLE 10 


Summary or FINDINGS ror ECONOMICALLY DISADVANTAGED (D) AND NONDISADVANTAGED 
(ND) CniLpREN IN Fottow THROUGH 


Income Group X Sex 


Measure Income group Sex 
End of kindergarten 
PPVT ND > D** | boys > girls** ns 
STAR ND » D** ns n8 
End of Grade 1 3 
PVT ND > D*** | boys > girls*** | ND boys > ae > D girls* 
girls 
MAT, Word Knowledge | ND » D* ns ns 
MAT, Word 
Discrimination ND » D* ns ns 
MAT, Reading 
Comprehension ND > D** ns ns 
MAT, Arithmetic ns ns ns 
Imitation ns ns ns 
Frequency of 
communication ns ns ns 
Creative thinking, 
nonverbal ns boys > girls** ns 
Creative thinking, verbal — - =? 
End of Grade 3 
PPVT ND > D* boys > girls*** ns 
Picture Arrangement Test ns boys > girls* ns 
PIAT, Mathematics ns boys > girls* ns 
PIAT, Reading 
Recognition ND > D* ns ns 
PIAT, Reading 
Comprehension ns ns ns 
PIAT, Spelling ns ns ns 
PIAT, General 
Information ND > D*** | boys > girls** ns 
Problem-solving latency ns ns ns 
ASQ, academic work ns ns ns 
ASQ, school authorities ns ns ns 
ASQ, overall attitude ns ns ns 
Student Self-Image ns ns ns 
Comments on procedures ns ns ns 
Requests for help D > ND* ns ns 
Comments on meaning ns boys > girls* ns 


Note. Abbreviations: PPVT 


f = Peabody Picture Vocabulary Test, STAR = Screening Test of Aca- 
demic Readiness, MAT = Metropolitan Achievement Test, : á 
Test, and ASQ = Attitude Toward School Questionnaire. 


PIAT = Peabody Individual Achievement 


a This analysis was not performed because scores were available for only two nondisadvantaged boys. 


*p< 05. 
afp cnoi 
***p < 00l. 


measures as the PPVT, the Picture Arrange- 
ment Test, Mathematics, and General In- 
formation. In regard to social-motivational 
characteristics, the FT children demon- 
strated greater curiosity than the NFT chil- 
dren by commenting more frequently about 
the meaning and significance of tasks pre- 
sented to them. These same differences 


favoring FT children were discovered in 
the cross-sectional study. In addition, in the 
cross-sectional study, the FT children were 
superior to NFT children on the intelli- 
gence-achievement measures of Reading 
Comprehension and Spelling. The FT chil- 
dren also displayed a more reflective ap- 
proach to problem solving, a greater willing- 
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ness to ask adults for assistance, and a more 
positive image of themselves as students,” 

It was particularly noteworthy to find 
that a Bank Street approach, which em- 
phasizes the discovery method, was asso- 
ciated with superior performance not only 
on intelligence and social-motivational mea- 
sures but on academic achievement mea- 
sures as well. National assessment findings 
(Stanford Research Institute, 1971) sug- 
gested that discovery approaches were not 
as effective as more structured curriculum 
approaches in producing gains on achieve- 
ment measures. These national findings, 
which were based on assessments after a 
year of school attendance, are consistent 
only with the kindergarten and Grade 1 
findings of the present study indicating no 
differences between FT and NFT children 
on the kindergarten Screening Test of Aca- 
demic Readiness and the Grade 1 Metro- 
politan Achievement Tests. However, FT 
children were superior to NFT children on 
achievement measures after FT children 
had experienced a Bank-Street-type pro- 
gram for four years. There is no way of 
determining from the present study whether 
even better achievement performance would 
accrue with a more structured approach. It 
is important to note, though, that these 
third-grade findings do not support the view 
which has developed (Pines, 1967) that the 
“whole child” approach does not effectively 
Improve the achievement scores of dis- 
advantaged children. 

The findings of this study must be inter- 
preted with some caution in view of the fact 
that children were not randomly assigned to 
FT and NFT programs at the beginning of 
the study. Whether there were selective fac- 
tors operating and in what direction such 
factors might have influenced the results 
cannot be assessed until replication studies 
are completed in which there has been ran- 
dom assignment of children.® (Such a study 
Eres 


"The similarity in the test scores of the NFT 
cross-sectional and NFT longitudinal groups sug- 
Bests that the performance of the cross-sectional 
children was not impaired by having less testing 
experience, 

TA penetrating discussion of these issues may be 
found in Campbell and Erlebacher (1970). 
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is now in progress at this FT program.) It 
was nevertheless encouraging that positive 
effects were visible in the FT children 
studied here. The findings certainly call into 
question the general proposition that com- 
pensatory education is a failure. In addition 
to demonstrating that children who received 
a special program for several years were 
superior on a wide array of measures to 
children who did not receive such a program, 
the findings also suggest that other types of 
remedial efforts benefit disadvantaged chil- 
dren. Comparison children attending a school 
with a mixed socioeconomic grouping and 
enriched program and a school with a 
tutorial program displayed a less impulsive 
approach to problem solving, greater mathe- 
matics achievement, and a more positive 
attitude toward school authorities than did 
children who were attending a school which 
had no special remedial or enrichment pro- 
grams of any kind. 

In interpreting the Follow Through pro- 
gram effects, it is important that the FT 
children not only had a prior year of Head 
Start experience but that during their four 
years in the program they were in inter- 
action with a number of children from 
better-educated, higher-income families. 
The finding of better performance in the FT 
and NFT schools with mixed socioeconomic 
and racial groupings is consistent with the 
findings of the Coleman report (Coleman, 
1966). This finding also supports the view 
recently expressed by defenders of busing 
(Pettigrew, Useem, Normand, & Smith, 
1973) that in order for achievement to be 
higher in desegregated schools, other crucial 
conditions( such as adequate school services 
and remedial training) must be met, as they 
were in this FT program. Just how impor- 
tant socioeconomie mixing was to this pro- 
gram will be clarified when studies are com- 
pleted of similar FT programs attended only 
by economically disadvantaged children. 


Economically Disadvantaged and 
Nondisadvantaged Comparisons 

Although there was no nondisadvantaged 
non-FT group with which to compare the FT 
nondisadvantaged children at the end of 
third grade, there is no evidence that attend- 
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ing a program in which disadvantaged chil- 
dren were in the majority was detrimental 
to the learning growth of the nondisadvan- 
taged children. These children’s PPVT IQ 
scores did not change over their four years 
in the program. Furthermore, converting the 
nondisadvantaged children’s scores on the 
third-grade achievement tests into grade 
equivalent levels indicated that these chil- 
dren were performing at or (more typically) 
above grade level in all academic areas. 

The nondisadvantaged children attending 
the Follow Through program were generally 
superior to the economically disadvantaged 
children in the program on intelligence and 
achievement measures at all testings. This 
superiority was less marked in the fourth 
year of school than it was in the first two 
years. At the end of Grade 3, no significant 
differences were found between the economi- 
cally disadvantaged and nondisadvantaged 
children on the Picture Arrangement, 
Mathematics, Reading Comprehension, or 
Spelling tests, or on any of the social- 
motivational measures. It would be erro- 
neous to conclude from this that the FT 
program resulted in the disadvantaged chil- 
dren attaining a level of intellectual func- 
tioning equivalent to that of the nondis- 
advantaged children, since on every measure 
where a difference was found, the difference 
was in favor of the nondisadvantaged chil- 
dren. But, given the striking differences in 
the home lives of these children, it would 
surely be naive to expect that a compensa- 
tory program alone would be powerful 
enough to produce equivalent intellectual 
functioning. 

In sum, the findings of this study indicate 
that while the Follow Through program 
assessed was not capable of ameliorating all 
of the negative effects of living in an eco- 
nomically disadvantaged environment, the 
program was highly beneficial to the chil- 
dren who participated in it. The longitudinal 
and cross-sectional evidence together point 
to the conclusion that the gains accruing 
from compensatory education programs are 
commensurate with the duration and 
amount of effort which are expended on 
these programs. 
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ABSTRACTIVE PROCESSES IN THE REMEMBERING OF PROSE! 


RONALD E. JOHNSON* 
Purdue University 


The patterning of recall of linguistic subunits of textual prose was 
found to be strongly related to the semantic dimensions. of abstract- 
ness-concreteness, specificity of denotation, comprehensibility, and in- 
terest. Attesting to the generality of the findings, the relationships be- 
tween the textual dimensions and recall were evident with two textual 
passages, with two methods of measuring the semantic dimensions, and 
at immediate and delayed retention intervals. Remembering was char- 
acterized as being both a reconstructive and an abstractive process. 


The remembering of semantic information 
from prose appears not to be based upon a 
literal remembering of exact words (Bart- 
lett, 1932). Verbatim recalls are the ex- 
ception, and the usual mode of recall is 
the paraphrase. As the retention interval 
lengthens, correct paraphrasings also de- 
cline, and the number of qualitative changes 
in remembering increases (Johnson, 1962). 
Since semantic units are amalgamated in 
recall, Cofer (1973) reaffirms Bartlett’s 
(1932) theory that memories are stored in 
schematic form and that recalls are con- 
structions from the schemas. 

Zangwill (1972), in contrast, asserts that 
“remembering is better described as an ab- 
stractive than as a constructive process [p. 
128)" Although Zangwill acknowledged 
qualitative changes, the predominant change 
in recall was noted to be omissions (Gomu- 
licki, 1956; Johnson, 1962), More important, 
the patterning of omissions was not random, 
in that the prose elements which carried 
action content were remembered quite well 
whereas other segments were remembered 
poorly. Since the learner had to process all 
subunits in order to emphasize selected sub- 
units, Zangwill (1972) argued that the 


* The specificity data were presented in a paper 
delivered at the 1972 meeting of the Midwestern 
Psychological Association, Cleveland. Analyses of 
the concreteness data were presented at the 1972 
meeting of the American Psychological Associa- 
tion, Honolulu. 

* Requests for reprints should be sent to Ronald 
E. Johnson, SCC “G,” Purdue University, West 
Lafayette, Indiana 47907. 


occurrence of selective omissions reflects an 
abstractive process in learning. 

What are the subunits in prose which are 
likely to be remembered? For narrative 
prose, the remembering of individual sub- 
units is positively related to the importance 
of the subunits to the semantic structure of 
the passage (Gomulicki, 1956; Johnson, 
1970). For textual prose, segments judged 
important in the logical structure of a 
passage were remembered better (Meyer & 
MeConkie, 1973). Variations in meaning- 
fulness also are strongly related to the 
remembering of textual subunits (Johnson, 
1973). : 

The present research explored the possi- 
bility that additional textual dimensions 
were related to the patterning of omissions. 
For the dimensions of interest, concreteness, 
Specificity, and comprehensibility, previous 
studies of prose learning have not examined 
the possibility that dimensional variations 
among prose subunits may be related to the 
patternings of recall. The present study 
attempts to fill this empirical gap. 


METHOD 
Materials 


Two textual passages were adapted from college 
textbooks. One passage, entitled “The Role of 
Language in Learning,” was a modified excerpt 
from Carroll's (1964) Language and Thought: 
“Language” contained 650 words in 5 paragraphs 


*Copies of the textual passages and the in- 
structions given to the various raters may be 0b- 
tained from the author. 
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and 19 sentences. The second passage, “Evolution 
of the Brain," was derived from prose excerpts in 
introductory psychology textbooks written by 
Munn (1946) and Wickens and Meyer (1955). 
*Evolution" had 810 words in its 7 paragraphs 
and 40 sentences. 

A sample of 52 college students, enrolled in 
psychology courses, divided “Language” into prose 
subunits by indicating locations which were psy- 
chologically acceptable for pausing (Johnson, 
1970). The validity of a psycholinguistie boundary 
was assumed when a majority of raters agreed that 
the junction was an acceptable location for paus- 
ing. Since some subunits were only one or two 
words in length, the experimenter reduced the re- 
sulting 69 units to 60 units by combining introduc- 
tory sentence modifiers with their clauses and by 
combining units in seriation. The 60 subunits had 
a mean word length of 10.8 (SD = 48). Seven of 
the subunits were intaet sentences. Twenty of the 
pausal boundaries were located at the junctions 
between sentences, while 41 boundaries occurred 
within sentences, In a determination of the reli- 
ability of the segmentation process, the 52 raters 
were divided randomly into two groups of 26 
raters, and the pausal locations in the two sets of 
ratings showed a Pearson correlation of .98. 

"Through the same partitioning process used with 
"Language," 50 of the 52 raters segmented “Evolu- 
tion” into 89 pausal units. After the experimenter 
combined short introductory modifiers and serial 
units into adjacent subunits, the number of prose 
subunits was 80. Forty pausal boundaries were 
located within sentences, and 11 of the sentences 
were unbroken by a pausal boundary. The mean 
number of words in each subunit was 10.1 (SD = 


. 53). A random split of the 50 raters into two 


groups of 25 raters again showed high reliability 
for the rater judgments (r = 97). 


Dimensional Ratings 


Raters of abstractness-concreteness were told 
that concrete phrases, such as “the wounded fox 
was relentlessly pursued by the barking hounds,” 
had ready reference to concrete, tangible objects 
or events. In contrast, an abstract phrase, such as 
the contemporary influence had not been anti- 
cipated previously,” was described as more general 
and less readily referenced to particular persons, 
places, or things. 

Specificity was to be judged by several criteria. 
First, the generality of the words themselves, for 
example, furniture versus table, was a determinant 
of the generality of the prose segment. Second, 
Phrases telling what something was, for example, 
wrinkled, were to be judged as more specific than 
Phrases telling what something was not, for ex- 
ample, not very smooth. Third, the presence of 
More detail in a description ordinarily was asso- 
ciated with greater specificity. Raters were advised, 

Owever, that the length of the phrase was not a 
reliable indicator of the amount of detail. Fourth, 
Phrases referring to an exact quantity were said to 
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be more specific, for example, siz bottles as opposed 
to many bottles. Fifth, nouns modified by adjec- 
tives, and verbs modified by adverbs were to be 
judged as more specific than their unmodified 
counterparts. Sixth, phrases that included preposi- 
tional phrases were said to be more likely to be 
high in specificity, for example, car with a dent in 
the fender versus car with a dent versus car. 

Differences in comprehensibility were said to 
be evident by the ease with which phrases could be 
apprehended or understood. Variations in compre- 
hensibility were said to be most evident during the 
initial reading of passages and also during rapid 
readings. Acknowledging that most phrases in 
textual passages could be apprehended without too 
much difficulty, it was suggested that raters judge 
how comprehensible the phrases would be to 
students who were either younger or less intelligent. 
than themselves. 

Raters of interest were told that within any 
textual passage, including dull passages, some por- 
tions of content were more interesting than others. 
Judgments regarding interest, therefore, were to be 
made in the context of how interesting a phrase 
was in comparison with other phrases in the same 
textual passage. 

Each textual dimension was assessed by two 
methods of rating. In the eliminative method 
(Johnson, 1970), raters judged the prose units by 
eliminating units which were lowest on the dimen- 
sion being rated. A count of the number of times 
that a subunit was allowed to remain in the passage 
provided a measure of the unit's relative status on 
that particular dimension. Subunits were elimi- 
nated qua subunits, and the total word count for 
all remaining subunits had to be within 15 words 
of a specified number. Equal numbers of raters 
were assigned to groups eliminating either one 
fourth, one half, or three fourths of the words, 
Each rater judged both textual passages and 
eliminated the same proportion of words in each 
passage. The raters’ passages had diagonal slashes 
separating the subunits, and word counts were 
printed in the margins. 

Raters were recruited from introductory classes 
in psychology, and they performed the ratings in 
exchange for experimental credit. Independent 
groups of subjects rated each of the four dimen- 
sions (numbers of raters making eliminative judg- 
ments: concreteness, n = 72; specificity, n = 75; 
comprehensibility, n = 78; and interest, n = 99). 

The textual dimensions also were assessed by 
7-point categorical rating scales. Prior to perform- 
ing the ratings, the raters read the passage in its 
usual prose format. The rating form itself had a 
eolumnar listing of the subunits followed by a 7- 
point categorical scale. To minimize bias, the 
directionality of the scale was reversed for one half 
of the raters. A “1” or “7” on the scale was labeled 
"Least ——,” or “Most —— —," and “4” was "aver- 
age.” As with the eliminative method, raters (n = 
220) judged each passage on only a single dimen- 
sion (numbers of raters: concreteness, n = 50; 
specificity, n = 66; comprehensibility, n = 40; and 
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interest, n = 64). For the statistical analyses, one 
set of ratings on each dimension was transposed 
to provide a common directionality. The 7-point 
ratings of the 60 subunits of “Language” showed 
a mean correlation of 85 (range = .72-94) with 
the number of times that raters retained the sub- 
units in the eliminative method. For the 80 sub- 
units of “Evolution,” the mean correlation of the 
two sets of dimensional ratings was .76 (range = 
56-.86). 


Measurement of Recall 


Undergraduate students in psychology courses 
were randomly assigned to immediate or delayed 
reproduction groups. All subjects within a class- 
room received the same textual excerpt. Learners 
were advised to “read the textual selection two 
times at your regular reading rate” and that “some- 
time in the future you will be tested on the 
accuracy of your recall.” Following the readings, 
subjects in the delayed groups were dismissed, and 
the remaining subjects were asked to “write the 
textual passage as accurately as you possibly can. 
Recall the exact words and ideas if you can.” After 
reproductions were completed, subjects were re- 
quested to avoid discussing the experiment. 

Immediate reproductions of “Evolution” were 
attempted by 58 subjects, and seven-day reproduc- 
tions were produced by 56 subjects. Reproduc- 
tions of “Language” were attempted by 61 learners 
at the immediate interval and 46 learners at the 
delayed interval. The lower sample sizes at the 
delayed intervals resulted from absences and from 
claims that nothing could be remembered. Since the 
loss in subjects was restricted to the delayed inter- 
val, the data for each retention interval were 
analyzed separately, Due to the attrition, no con- 
clusions may be drawn regarding differential losses 
in retention as a function of different levels of the 
semantic dimensions, 

The verbatim reproduction of any prose sub- 
unit was & rarity, and a subunit was judged to be 
recalled if the unit was represented in any form 
in the learner’s reproduction. Two trained under- 
graduate raters independently judged the protocols, 
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Discrepancies in scoring were resolved throug 
conference. If agreement was not reached, a thi 
scorer decided the issue. 

For each of the four semantic dimensions, a 
measured by each of the two methods, the pros 


rank ordered according to their status on tha 
dimension. Following the rank ordering, the subs 
units were separated into four groupings ranging 
from the lowest one fourth on the dimension to the 
highest one fourth. The learner then received fou 
scores based upon the number of units recalle 
from the lowest one fourth, the second lowest on 
fourth, the third lowest one fourth, and the highes 
one fourth. Since “Language” contained 60 prose 
subunits, the maximum score at any level was 16, 
The 80 subunits of “Evolution” were similarly 
divided into four levels of 20 units. 


RESULTS 


Table 1 shows mean recall as related to 
levels of concreteness. Repeated measures 
analyses of variance were computed sepa- 
rately for the two rating methods and for 
the two retention intervals. As indicated b 
the F ratios, each analysis gave evidence 
that recall was superior for pausal subunits 
high in concreteness (ps < .001). Prose | 
subunits in the upper one fourth in con- 
ereteness were recalled approximately three 
to nine times better than subunits in the 
lower one fourth in concreteness. The facil- 
itative effects of concreteness are consistent | 
with previous research focusing on the 
recall of individual words from sets of un- | 
related sentences (James, 1972), and the 
overall recall of passages differing in their 
constituent proportions of abstract or con- 
erete words (Yuille & Paivio, 1969). 

In paired-associate learning, higher levels 


TABLE 1 
Mean RECALL or TEXTUAL SUBUNITS A8 A FUNCTION oF AnsTRACTNESS-CONCRETENESS 
$ " Levels of concreteness 
Textual passage — | Dimensional ratings|Retention interval r aj 
Lowest | Second Third Highest 

“Language” | eliminative | immediate | 1.28 | 3.61 | 6.03 | 9.30 |252.10 3/180 
: raid: delayed .43 qs s 1 3/135 
“Evolution” eliminative immediate 3.26 zn qe Hs A 165-8 sm 
; delayed 95 | 2. | 83 | 3/165 
“Language” categorical immediate 2.02 dio Fn $0 28 3 3/180 
y delayed .65 t f à ; 3/135 
“Evolution” categorical immediate 2.66 "rd $30 13 ede 3/171 
delayed ‘73 | 3.21 | 3.43 | 7.11 | 130.66 | 3/165 


* All ps < .001. 
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TABLE 2 
Mean RECALL OF TEXTUAL SUBUNITS AS A FUNCTION OF SPECIFICITY or DENOTATION 
Levels of specificity 
Textual passage [Dimensional ratings [Retention interval p df. 
Lowest | Second | Third | Highest 

“Language” eliminative immediate 2.36 3.54 5.52 8.79 | 153.36 | 3/180 
j ) asi. delayed 1.07 1.50 2.17 8.41 | 22.10 | 3/135 
Evolution” eliminative immediate 3.71 4.98 10.24 | 11.03 | 187.79 | 3/171 
p delayed 1.95 1.50 4.85 6.20 | 101.11 | 3/165 
“Language” categorical immediate 1.03 4.03 6.18 8.97 | 217.86 | 3/180 
4 ? delayed 43 1.61 2.74 3.37 39.91 | 3/135 
“Evolution” categorical immediate 3.71 5.36 8.47 | 12.43 | 174.90 | 3/171 
delayed 1.68 2.02 3.41 6.77 87.92 | 3/105 

* All ps < .001. 


of stimulus specificity are associated with 
Superior remembering (Paivio & Olver, 
1964). As shown in Table 2, prose units 
rated high in specificity also are more likely 
to be recalled (ps « .001). The only numeri- 
cal reversal to the relationship was not sig- 
mnt in a Newman-Keuls analysis (p > 
405). 

, Research on comprehensibility gives in- 
direct support to the expectation that verbal 
units high in comprehensibility would be 
more likely to be retained (Carroll, 1971). 
In the present experiment, as shown in Table 
3, higher comprehensibility was associated 
with superior remembering (ps < .001). The 
only significant reversal of trend occurred 
between “Evolution’s” lowest and second 
levels as determined by the eliminative 
method (ps « .01). With the categorical 
method, the lowest and second levels did not 
differ significantly (ps > .05). 


Although interest is reputed to be a prime 
determinant of remembering, a search of 
the literature uncovered only two studies, 
and both studies found no relationship be- 
tween interest and the retention of discourse 
(Becker, 1963; Klare, Mabry, & Gustafson, 
1955). In the present study, as shown in 
Table 4, higher levels of interest were asso- 
ciated with superior remembering (ps < 
.001). The numerical reversals to the gen- 
eral trend were not significant (ps > .05). 

To summarize, the four textual dimen- 
sions were strongly related to both im- 
mediate and delayed recall. Regardless of 
the method of measurement, prose subunits 
rated high on a particular dimension were 
associated with superior recall, 

The comparability of the four dimensions 
in predicting recall suggests the possibility 
that the dimensions are simply overlapping 
measures of the same entity. Empirically, 


TABLE 3 
MEAN RECALL OF TEXTUAL SUBUNITS AS A FUNCTION OF COMPREHENSIBILITY 
Levels of comprehensibility 
Textual passage [Dimensional ratings [Retention interval Fe af 
Lowest | Second Third Highest 
“Lan y iminati immediate | 2.97 | 5.33 | 5.21 | 6.75 | 65.49 | 3/180 
pw nd delayed .98 2.15 2.52 2.50 17.87 | 3/135 
"Evolution" eliminative immediate | 8.14 | 5.90 | 7.84 7.81 15.62 | 3/171 
delayed 4.07 2.64 3.41 4.32 5.79 | 3/165 
"Language" | categorical | immediate | 2.77 | 3.66 | 5.97 | 7.82 | 128.32 3/180 
delayed .87 1.26 3.17 2.85 31.27 | 3/135 
"Evolution" categorical immediate 6.19 6.14 9.03 8.60 46.39 | 3/171 
delayed 2.27 2.20 4.91 5.11 46.13 | 3/165 
* All ps « .001. 
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TABLE 4 
Mean RECALL or TEXTUAL SuBUNITS as A FUNCTION oF INTEREST 
Levels of interest 
Textual passage — |Dimensional ratings Retention interval P 4f 
Lowest | Second | Third | Highest 
s —Ó—Á——M | ———— | ~ 
«t 5 iminati immediate 97 5.26 5.16 5.82 17.93 | 3/180 
Trai MULT LER 43 | 2.24 | 2.37 2.41 | 12.15 | 3/135 
“Evolution” eliminative immediate -62 | 6.50 7.88 9.97 64,20 /171 
delayed B 2.57 3.71 5.09 2n a 
s Ai ical immediate .84 | 4.02 | 6.10 6.26 39.: 
E Kos cur delayed 07 1.74 2.48 2.87 21.51 3/135 
“Evolution” categorical immediate .29 7.72 7.83 11.12 | 145.09 8/171 
delayed 43 | 3.64 3.36 6.05 64.09 | 3/165 


* All ps < .001. 


each textual dimension was significantly 
correlated with at least one other textual 
dimension.* On a theoretical basis, too, the 
textual dimensions appear to be closely 
interrelated. Whether a subunit was judged 
to be interesting, for example, might depend 
upon the ease of comprehension, or perhaps 
directly upon concreteness or specificity. 

In assessments of the independence of the 
textual dimensions, stepwise multiple regres- 
sion analyses and factor analyses were com- 
puted. For each textual passage, the de- 
pendent variable in the regression analyses 
was the number of recalls of each textual 
subunit. The potential predietors included 
the eight measures of the textual dimensions, 
the number of words in each subunit, and 
serial position. Entry into the regression 
equation was permitted only when the new 
predictor accounted for a significant incre- 
ment of the residual variance, 

Analysis of the immediate recalls of the 
subunits of “Language” showed three vari- 
ables accounting for independent portions of 

, the recall variance. The order of entry into 
the equation, and the results of tests to 
determine entry were concreteness, elimina- 
tive scaling (F = 109.14, df = 1/58, p < 
:001) ; comprehensibility, eliminative scaling 
(F = 9.15, df = 1/57, P < 01); and com- 
prehensibility, categorical scaling (F = 4.70, 
df = 1/56, p « .05). The optimal set of 
three predictors showed a multiple correla- 
tion of .85 with immediate recall, In the 


‘Tables showing the intercorrelations among 
variables are available from the author, 


delayed recall of “Language,” an optimal 
set of three variables showed a multiple cor= 
relation of .74 with the criterion variable: 
concreteness, eliminative scaling (F = 41.38, 
df = 1/58, p < .001); serial position (F = 
9.37, df = 1/57, p < .01); and interest, 
categorical scaling (F = 5.60, df = 1/56, 
p < .05). The immediate recall of “Evolu- 
tion” showed a multiple correlation of .79 
with a set of five independent predictors: 
specificity, categorical scaling (F = 49.51, 


df = 1/78, p < .001); comprehensibility, 


categorical scaling (F = 21.62, df = 1/77, 


P < .001); specificity, eliminative scaling 


(F = 4.97, df = 1/76, p < .05) ; comprehen- 
sibility, eliminative scaling (F = 6.01, df = 
1/75, p < .05); and interest, categorical 
sealing (F = 6.31, df = 1/74, p < .05). A 
Set of six variables showed a multiple cor- 
relation of .75 with the delayed recall of 


“Evolution”; concreteness, categorical scal- J 


ing (F = 30.53, df = 1/78, p < .001); 
Specificity, eliminative scaling (F = 6.28, 
df = 1/77, p < .05); serial position, F2 
8.13, df = 1/76, p < 01); comprehensibility, 
categorical scaling (F = 8.61, df = 1/18, 
P < 01); comprehensibility, eliminative 
sealing (F = 10.40, df = 1/74, p < 01); 
and interest, categorical scaling (F = 507, 
df = 1/74,p < .05). 

In summary, for each textual passage and 
each retention interval, the regression analy- 
Ses provided evidence that at least two 
textual dimensions were independent predic- 
tors of recall. The particular variables enter- 
ing into the prediction equations diffe 
somewhat in the various analyses, but each 
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textual dimension emerged as an independ- 
ent predictor in at least two analyses. In 
addition, serial position did not predict im- 
mediate recall, but primacy effects existed 
in delayed retention. 

Factor analyses provided additional in- 
formation on the independence of the 
various textual dimensions. Principal-com- 
ponents solutions, with communalities esti- 
mated as unity, were followed by orthogonal 
rotation. With the eriterion for rotation set 
at eigenvalues greater than unity, three 
factors were rotated in each of the prose 
passages. As defined by factor loadings of 
.30 or higher, the variables (and loadings) 
of Factor 1 of “Language” were concrete- 
ness, categorical scaling (.95) ; concreteness, 
eliminative scaling (.95) ; comprehensibility, 
categorical scaling (.87) ; comprehensibility, 
eliminative sealing (.82); specificity, cate- 
gorical scaling (.57) ; and specificity, elimi- 
native scaling (.49). Factor 2 was defined by 
interest, eategorical sealing (.87); interest, 
eliminative sealing (.83); number of words 
(.77) ; specificity, eliminative scaling (.63) ; 
Specificity, categorical scaling (.61); and 
concreteness, eliminative scaling (.36). The 
third factor was composed of serial position 
(.90), and specificity, eliminative scaling 
(.35). With immediate recall in the analysis, 
the variable loaded on Factor 1 (.78) and 
Factor 2 (.37). Delayed recall was included 
in Factor 1 (.60), Factor 2 (.42), and Factor 
3 (—.42). 

The first factor of “Evolution” was de- 
fined by specificity, categorical scaling (.91) ; 
Specifieity, eliminative scaling (.87); con- 
creteness, eliminative scaling (.81); con- 
creteness, categorical scaling (.74) ; interest, 
Categorical sealing (.76); interest, elimina- 
tive scaling (.66); and number of words 
(.36). Factor 2 included comprehensibility, 
categorical sealing (.92) ; comprehensibility, 
eliminative scaling (.88); concreteness, 
categorical scaling (.45) ; concreteness, elim- 
mative scaling (.40); interest, categorical 
Sealing (—.36), and number of words 
(—.80). The third factor consisted of serial 
Position (.88) ; interest, eliminative scaling 
(.89) ; and concreteness, categorical scaling 
(—.35). Immediate recall loaded only on 
Factor 1 (.76), while delayed recall had 
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pense on Factor 1 (.51) and Factor 3 

Overall, the factor analyses provided cor- 
roborative evidence of some independence 
among the textual dimensions. Three fac- 
tors were required to account for variations 
among the textual dimensions. Further, the 
entry of recall variables into more than one 
factorial cluster of textual dimensions sug- 
gests that orthogonal dimensions of prose 
can account for independent portions of the 
recall variance. 


Discussion 


Recall was strongly related to the 
semantic dimensions of abstractness-con- 
creteness, specificity of denotation, compre- 
hensibility, and interest. Unlike previous re- 
search with the semantic dimensions, the 
psycholinguistic segments were partitioned 
in consecutive serial order, without omission, 
from a larger passage of intact textual prose. 
The successful predictions of the pattern- 
ings of textual recall thus were made for 
all subunits within a prose passage. Attest- 
ing to the generality of the findings, the rela- 
tionships were evident with two textual 
passages, with two methods of measuring 
the dimensions, and at two retention inter- 
vals. The strength of the relationships be- 
tween recall and each of the textual dimen- 
sions also is evident from the fact that the 
prose subunits were not selected to differ 
maximally on any one of the textual dimen- 
sions, and each dimension was compared 
with the same sets of recall data. 

The similarity in factorial solutions for 
the two textual passages suggests some in- 
variance in the factorial composition of 
textual prose. Complexities in structure, 
however, also are evident from the emer- 
gence of multiple factors. If additional tex- 
tual dimensions had been entered into the 
factor analyses, additional factors probably 
would have emerged. Carroll's (1960) fac- 
tor analysis of variations between passages, 
for example, led to a solution with six inter- 
pretable factors, and a comparable com- 
plexity probably exists among subunits 
within a single passage. : 

Complexities in the semantie structure 
of prose suggest the possibility of different 
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influences on recall. In the free recall of 
nouns, for example, adjectival modification 
led to an inferior remembering of nouns 
(Cofer, Segal, Stein, & Walker, 1969). For 
prose, however, adjectival modification of 
nouns neither aided nor hindered recall, 
whereas the specificity level of nouns per se 
was a determinant of remembering (August, 
Proctor, Hynes, & Johnson, 1974). In the 
present study, then, the various determi- 
nants of specificity presumably did not 
exert a unitary influence on recall. Even so, 
denotative specificity was strongly asso- 
ciated with remembering, and this finding 
suggests the appropriateness of reexamin- 
ing the data relevant to the widely quoted 
principle that generalizations are remem- 
bered better than specific facts (e.g., Mc- 
Dougall, 1958). 

The dimensional ratings of the textual 
segments were inversely related to the pat- 
terning of selective omissions. Although such 
evidence is consistent with Zangwill’s 
(1972) contention that the learning of prose 
is an abstractive process, the data do not 
argue against a reconstructive theory of 
remembering. Cofer (1973) contrasts the 
processes of reconstruction and abstraction 
as alternative explanations of remembering, 
but the processes are not mutually exclu- 
sive. Separate ideational units in prose do 
become amalgamated in remembering (e.g., 
Bartlett, 1932), but the occurrence of selec- 
tive omissions also demonstrates that cer- 
tain types of ideational units are more likely 
to survive in recall. 

Since various textual dimensions were in- 
dependent predictors of recall, the process 
of abstraction appears to be influenced by a 
number of textual characteristics, Further, 
since selective omissions in immediate recall 
are evident even when equivalent learning 
times are allocated to each subunit (John- 
son, 1970, Experiment III), the abstractive 
process does not appear dependent upon dif- 
ferential allocations of processing time in 
learning or upon differential rehearsals after 
apprehension. The occurrence of abstraction, 
however, could stem from differential proc- 

essing of the various prose units at the time 
of learning. Such selective processing might 
be derived from learner strategies or from 
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existing knowledge structures, and an ey 
amination of the relation between textui 
ratings and prereading cloze performance 
would provide relevant evidence on thi 
issue. Since selective omissions were eviden 
at immediate and delayed retention inter 
vals, the events occurring during the rete 
tion interval do not appear implicated in th 
abstractive process. Another possibility] 
though, is that abstraction is determined b 
conditions existing at the time of recall 
With sufficient prompts for recall, for ex 
ample, the data might show that omitted 
units are represented in memory even though 
such units ordinarily are not recalled. 

At the applicational level, the modifica 
tion of textual content toward greater com: 
prehensibility, interest, concreteness, of 
specificity presumably would result in in 
creases in recall. Since dimensional varia-] 
tions might be correlated with differences in 
information load or predictability, appro- 
priate empirical tests would require the con- 
trol of rates of information transmission 
(Rubenstein & Aborn, 1958). 
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A model attributing poor recall to slow speech-motor encoding is de- 
scribed. The following results, predicted from the model, were obtained 
with poor readers of normal intelligence (n = 24) and normal readers 
(n = 24) from ages 7 to 13: (a) Poor readers named visually presented, 
nonword stimuli more slowly than normal readers; (b) fewer poor 
readers than normal readers employed a cumulative rehearsal strategy 
during a probed-recall task; (c) the use of cumulative rehearsal was 
significantly related to naming speed; (d) the performance of poor 
readers was inferior to that of normal readers for all but the most re- 
cently presented items of the probe-recall task; and (e) naming speed 
and use of cumulative rehearsal accounted for 91% of the true variance 
of early and middle serial positions of the probed recall task. 


Certain children, although they test from 
normal to high on intelligence and have had 
adequate instruction, learn to read and spell 
only with extraordinary difficulty, if at all. 
In this report, poor readers of this type are 
called dyslexic children. Although dyslexic 
children generally perform well on several 
subtests of the Wechsler Intelligence Scale 
for Children (WISC), investigators have 
consistently found that their scores are low 
on the Digit Span subtest (Klasen, 1972). 
In addition to the Digit Span subtest, dys- 
lexic children are deficient on a variety of 
other serial memory tests using visual and 
auditory input (Doehring, 1908). The 
present study tested a model attributing the 
impaired recall of dyslexie children to slow 
speech-motor encoding. 

The hypothesized relation of memory 
and encoding speed is based on a two-stor- 
age memory model (Atkinson & Shiffrin, 
1968). As an item is presented, it is covertly 
encoded to a speech-motor response and 
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briefly stored in a highly transient short 
term storage. Items in short-term stor! 
are rapidly forgotten due to displacem 
by subsequent items. If items in short-tel 
storage are rehearsed, they may be tra 
ferred to a more durable long-term stora 
Inereasing the presentation rate, howev f 
decreases the time between presentatio 
that is available for rehearsal, thereby 
ducing the probability of recalling all but 
the most recently presented items (Ellis & 
Hope, 1968; Glanzer & Cunitz, 1966). Re 
call of the two or three most recently 
presented items is not affected by increa& 
ing the rate of presentation because of the 
high probability that these items are stil 
available in short-term storage. - 
Unusually slow speech-motor encoding 
should have an effect similar to a fast rate? 
presentation. For individuals who are ui 
usually slow speech-motor encoders, little 
time is available for rehearsal between item 
presentations. This should reduce the prob- 
ability of recalling all but the most recently 
presented items; therefore, the performant? 
of dyslexic children, who are hypothesiZ 
to be slow encoders, should be inferior V 
that of normal children for all but the m 
recently presented items. In addition, thet? 
should be a strong correlation between 1% | 
call and encoding speed. 


— — 


ENCODING SPEED, REHEARSAL, AND PROBED RECALL 


Speech-motor encoding speed was mea- 
sured, in the present study, by asking sub- 
jects to name digits, color patches, and pic- 
tures as rapidly as possible. It is well known 
that verbal stimuli such as digits, letters, 
and words are named faster than concrete 
stimuli such as color patches and pictures. 
In a previous study, the difference between 
encoding speeds of dyslexic and normal chil- 
dren was larger for digits (verbal stimulus) 
than for colors and pictures (concrete 
stimuli). A similar interaction was ex- 
pected in the present study. 

Memory was tested with a probe-recall 
task. A series of cards were briefly exposed 
one at a time and placed face down before 
the subject. Each eard contained a digit. A 
probe digit was then presented, and the sub- 
ject’s task was to point to the face-down 
card with the matching digit. 

We were able, using the probe task, to ob- 
serve individual differences in visual scan- 
ning patterns. Atkinson, Hansen, and Bern- 
bach (1964) timed responses of preschool 
children on a probe-recall task. They found 
that response latency increased linearly 
from the most recent to the earliest probed 
positions, and they interpreted this as evi- 
dence that subjects counted backwards 
from the most recent item, using interitem 
associations to determine how far back the 
probed item occurred. In a pilot study with 
young adults, we obtained results suggesting 
a different strategy. Following presentation 
of the probe, eye movements were observed. 
These older subjects typically scanned for- 
ward from the first card, stopping on the 
card of their choice. It seems reasonable 
that direction of scanning should differ at 
different developmental levels. This dif- 
ference probably reflects the employment of 
à cumulative rehearsal strategy in which an 
attempt is made to recall previously pre- 
Sented items in their correct order each 
time a new item is presented. The effective- 
hess of cumulative rehearsal, of course, de- 
pends on locating the probed item by vis- 
ually scanning forward from the first card 
while reciting the items in the rehearsed 
see 
and unpublished study entitled “Encoding Speed 

mory Span in Dyslexic Children,” 1973. 
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order. Because many dyslexic children were 
expected to encode too slowly to permit 
cumulative rehearsal, we expected that 
fewer dyslexic than normal children would 
employ forward scanning. In addition, a 
strong correlation between encoding speed 
and evidence of forward scanning was pre- 
dicted. 


METHOD 


Subjects 


Two groups of boys, ranging from 7.6- to 13.4- 
years-old, were selected from public elementary 
schools. The groups were closely matched on age 
and approximately matched on father’s occupation 
as indexed by the Hollingshead Scale of Social 
‘Position. One group (n = 24), designated dyslexic, 
was composed of poor readers selected from special 
classes. Boys in this group were tested on the Wide 
Range Achievement Test. They scored, on the 
average, 2.3 years below expected grade level in 
reading, 2.8 years below grade level in spelling, 
and 1.9 years below grade level in arithmetic. Re- 
cent WISC IQ scores were available for these 
boys. Their mean full-scale IQ was 102, and scores 
ranged from 90 to 115. A second group (n = 24), 
designated normal, was selected from boys in regu- 
lar classrooms whose reading abilities were close to 
their expected grade levels. Depending on grade, 
their reading levels were measured by the 
Cooperative Primary Reading Test, Stanford 
Achievement Test, or California Test of Basic 
Skills. The average reading score of normal readers 
was 2 years above expected grade level. The in- 
telligence of normal readers was not determined, 


Tasks 


In the probe-recall task, eight cards were exposed 
one at a time and placed face down, from subject’s 
left to right. The cards, presented at a 1.5-second 
rate, contained the Digits 1 through 8 in random 
order. A probe digit was then presented, and sub- 
ject’s task was to point to the card with the match- 
ing digit. If the first choice was incorrect, a second 
choice was given. Twenty-four trials were given, 
preceded by 2 practice trials. This allowed each 
position to be probed three times. Within blocks of 
8 consecutive trials, all 8 positions were probed 
once in random order. 

Digit naming was tested by requiring subjects 
to name 50 randomly sequenced digits, typed in a 
single row, as fast as possible. Naming was from 
left to right. Two-syllable digits (0 and 7) were 
excluded, Time and errors were recorded. Two 
forms were developed for the digit-naming test. 

Color naming was tested by requiring subjects 
to name 30 color patches as fast as possible. The 
color patches were 34-inch squares arrayed on an 
8 X 10 inch plate in 5 rows and 6 columns. Naming 
was from left to right and top to bottom. The 
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following colors were sequenced randomly, except 
that no two adjacent color patches were the same: 
red, orange, yellow, blue, green, black, and brown. 
Time and errors were recorded. 

Picture naming was tested by requiring subjects 
to name 25 tinted line-drawings as fast as possible. 
The drawings were on 2-inch squares arrayed on & 
12 X 12 inch plate in 5 rows and 5 columns. Nam- 
ing was from left to right and top to bottom. The 
following pictures were used: wagon, pear, duck, 
umbrella, girl, gun, nurse, moon, fish, orange, leaf, 
cat, flag, monkey, tree, pig, ball, dog, house, apple, 
turtle, lamp, kite, king, and star. Time and errors 
were recorded. 


Procedure 

Subjects were tested individually on identical 
tasks in a fixed order. Two testers were used. Half 
of the boys tested by each tester were normal 
readers, and half were poor readers. Testers did 
not know which boys were poor readers. Naming 
tasks were given in the following order: digits 
(Form A), colors, pictures, digits (Form B). After 
each naming task, six trials of the probe-recall 
task were given. 


ResuLTS AND DISCUSSION 


Naming Tests 


Mean error rates of dyslexic and normal 
subjects were less than .02 for each type of 
stimulus. Although dyslexic boys tended to 
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STIMULUS MATERIALS 


Ficure 1. Mean naming speeds of dyslexic and 
normal boys for digits, colors, and pictures. 
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make more errors than normal boys, 
ferences were small and not significant. 

Mean naming speeds, expressed as i 
per second, are shown in Figure 1. Thi 
data were analyzed with a 2 x 3 x 4 
peated measures analysis of variance. T] 
first factor was reading ability, with t 
levels: dyslexic and normal. The seco 
factor was age, with three levels; you 
(91-120 months), middle (121-144 months 
and old (145-161 months). The third fac 
was stimulus, with four repeated measures 
digits (Form A), colors, pictures, and digi 
(Form B). 

All main effects were significant. Dyslexi 
boys were slower than normal boys (F = 
63.8, df = 1/42, p < .001). Thus, a maj 
hypothesis was «confirmed. In addition, 
younger children were slower than old 
children (F = 7.8, df = 2/42, p < .005), 
and performance on colors and pietu 
(concrete stimuli) was slower than perform- 
ance on digits (verbal stimulus; F = 197.5, 
df = 3/126, p < .001). 

A significant Age x Stimulus interaction 
was found (F = 3.4, df = 6/126, p < .005). 
This interaction was due to a larger dif- 
ference between younger and older subjects 
on digit-naming speed than on speed of 
naming colors and pictures. The interaction | 
may be due to the possibility that children, 
as they advance in school, gradually ac 
cumulate more practice in speech-motor 
encoding of verbal stimuli than concrete 
stimuli. This would lead to a gradual di- 
vergence of encoding speeds for verbal and 
concrete stimuli as children grow older. 

As expected, a significant Ability X 
Stimulus interaction was also found (F = 
18.6, df = 3/126, p < .001). This inter- | 
action was due to a larger difference be- 
tween dyslexic and normal subjects on digit- 
naming speed than on speed of naming 
colors and pictures. The interaction is con- 
gruent with the common clinical observation 
that dyslexic children are specifically im- / 
paired on tasks requiring perception of ver- 
bal material, while they evidence 1° 
dramatic inability to function in an en- 
vironment of concrete stimuli. This Ability 
X Stimulus interaction may be explained in 
different ways. First, the interaction may be f 
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due to the possibility that dyslexie children 
practice speech-motor encoding of verbal 
material, relative to conerete material, less 
than normal children. This may be due to 
a lack of motivation to complete school as- 
sigments. The interaction may be explained 
differently, however. Dyslexic and normal 
children may practice speech-motor encod- 
ing equally, with both groups aceumulating 
more practice for verbal than concrete ma- 
terial. But dué to a neurological deficit, en- 
coding speed may be enhanced by practice 
less for poor readers than for normal chil- 
dren. This combination of experiential and 
neurological factors would also produce an 
Ability x Stimulus interaction. 


Memory Test 


Following presentation of the probe on 
each trial of the memory test, eye move- 
ments were observed. Subjects who typi- 
cally scanned the cards from left to right, 
finally stopping on the card of their choice, 
were classified as scanners. All others were 
classified as nonscanners. As expected, the 
incidence of nonscanners was significantly 
greater among dyslexie boys. There was 
only 1 nonscanner in the normal group, 
while 11 dyslexic boys were nonscanners 
Gg? = 90, p < .01, two-tailed). This dif- 
ference indicates that significantly fewer 
dyslexic boys employed cumulative re- 
hearsal than normal boys. Evidence sup- 
porting this interpretation will be presented 
later in this section. 

Three serial-position curves, showing per- 
centages of correct first-choice responses 
for each position, are shown in Figure 2. The 
performance of normal boys is shown by a 
Single curve which includes the data from 
the lone nonseanner. The performance of 
dyslexie boys is shown by two curves: one 
for scanners and another for nonscanners. 
All subjects were more likely to recall late 
items than middle items (recency effect). In 
addition, normals and dyslexie scanners 
were more likely to recall early items than 
middle items (primacy effect). A primacy 
effect, however, was not obtained for dys- 
lexic nonscanners, Thus, only two of the 
three curves have the bow shape that 


783 


-&- normal reoders 

-4&-poor readers 
(scanners) 

-O-poor readers 
(non-scanners) 


100 


PERCENT CORRECT FIRST RESPONSES 


Je Si R N Sac Oa oS 
SERIAL POSITION 


Fiaure 2. Percentages of correct first-choice re- 
sponses at each serial position for dyslexic boys 
(scanners and nonscanners), and normal boys. | 


is characteristic of serial-position recall 
curves. 

As hypothesized, the performance of dys- 
lexic boys (scanners and nonscanners com- 
bined) was inferior to that of normal boys 
in all but the last few positions. This 
Ability X Position interaction was con- 
firmed with a 2 X 8 repeated measures 
analysis of variance (F = 2.7, df = 7/382, 
p < 01). In addition, significant main ef- 
fects were obtained for reading ability (F= 
26.6, df = 1/46, p < .001) and for serial 
position (F = 13.7, df = 7/332, p < .001). 

As noted above, the higher incidence of 
nonscanners in the dyslexic group indicates 
that fewer dyslexie boys employed cumu- 
lative rehearsal than normal boys. This in- 
terpretation is supported by the failure to 
obtain a primacy effect for dyslexie non- 
scanners, while clear primacy effects were 
obtained for normals and dyslexie scanners. 
If, as is commonly believed, a primacy ef- 
fect is due to cumulative rehearsal of initial 
items (Rundus & Atkinson, 1970; Tulving, 
1968), these data indicate that nonscanners 
do not cumulatively rehearse. 

Further evidence for this interpretation 
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may be seen in the apparent interaction of 
scanning with serial position. Comparing 
serial-position curves of scanners and non- 
scanners only, scanners recalled more in pri- 
macy and middle positions, but less in re- 
cency positions. This would be expected if 
nonscanners do not rehearse and if scanners 
continue to rehearse items from the first 
part of the list while the last few items are 
presented. The Scanning X Position inter- 
action was confirmed with a 2 x 8 repeated 

. measures analysis of variance (F = 2.6, 
df = 7/154, p < .025). In addition, the main 
effect for serial position was significant 
(F = 12.1, df = 7/154, p < .001), but the 
main effect for scanning was not significant 
(F = 1.0, df = 1/22). 


Correlates of Scanning 


If cumulative rehearsal is blocked by un- 
usually slow encoding and if scanning is a 
valid measure of rehearsal, there should be 
a significant point-biserial correlation be- 
tween scanning and naming speed. This was 
tested, for poor readers only, with two cor- 
relations that differed by their use of verbal 
or concrete stimuli to measure naming 
speed. For the correlation involving verbal 
stimuli, digit-naming speed was computed 
by averaging scores from alternate forms of 
the digit-naming test. For the correlation in- 
volving concrete stimuli, color-picture-nam- 


TABLE 1 
Scanners AND Nonscanners RANKED BY 
Namine SPEED 


Color-picture- i i 

naming Wee foe Aa ETA Sams 
67 0 1.01 1 
.68 0 1.02 1 
Dot 0 1.03 1 
45 0 1.07 1 
.76 0 1.08 H 
78 1 1.14 1 
.80 1 1.14 1 
.83 0 1.17 1 
-83 0 1.17 1 
85 0 1.36 0 
.89 0 1.40 1 
.96 0 1.41 1 


Note. This refers to dyslexic boys only. 
^ Items per second. 
^ Nonscanners coded 0; scanners coded 1. 
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ing speed was computed by averaging sco: 
from the color and picture tests. Scanning 
was coded as a two-value categorical vari- 
able. Contrary to expectation, the cor 
relation of scanning and digit-naming speed 
was not significant (r = .18, F < 1, df = 
1/22). The correlation of scanning and 
color-picture-naming speed, however, was 
highly significant (r = .57, F = 10.6, df = 
1/22, p < .005). To illustrate the latter 
relation, naming speeds of poor readers are 
ranked in Table 1. Beside each entry is & 
scanning code. It may be seen that a naming 
speed of one item per second separates scan- 
ners from nonscanners with 88% accuracy. 
The point-biserial correlation of scanning 
with age, also computed for dyslexic boys, 
was not significant (r = .23, F = 1.2, df= 
1/22). 

These results are puzzling. Why should 
digit rehearsal be significantly related to en- 
coding speed for colors and pictures but not 
to encoding speed for digits? The corre- 
lations suggest that children do not re- 
hearse digits, regardless of their encoding 
ability for digits, until they are able to re- 
hearse colors and pictures. Since encoding 
speed develops more slowly for concrete 
items (colors and pictures) than for verbal 
items (digits), the emergence of cumulative 
rehearsal in a child's behavior, regardless of 
the type of item to be rehearsed, appears to 
be delayed until he is able to encode con- | 
crete items at speeds above some minimum 
value. Judging from the present data, this 
minimum value is about one item per sec- 
ond. 


Correlates of Long-Term Storage 


2 
It was found that recall probabilities of | 
normals and dyslexics (scanners and non- 
scanners combined) differed in all but the 
last two positions. According to a two-stor- 
age model, this pattern indieates a long 
term-storage difference. It was hypothesize’ 
that this long-term-storage difference i$ 
caused by a rehearsal deficit. From results | 
already presented, it appears that slow | 
Speech-motor encoding may limit cumu- 
lative rehearsal or block it completely. In 
the first case, recall should be related t 
digit-naming speed. In the second case Y $ 
ji 


naye 
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should be related to scanning. To determine 
the relative strengths of these relationships, 
the multiple correlation of long-term storage 
with digit-naming speed and scanning was 
computed. Scanning was coded as a two- 
value categorical variable. Long-term stor- 
age was determined for each subject by 
totaling the number of correct first-choice 
responses in Serial Positions 1 through 6. 
The coefficient-alpha reliability of long- 
term storage was an ry, of .65, indicating a 
theoretical maximum correlation of long- 
term storage with other variables of .80 
("xx"). The obtained multiple correlation 
of .77 was very near the theoretical upper 
limit. Putting it another way, digit-naming 
speed and scanning together accounted for 
91% of the true variance of long-term stor- 
age (100 R?/r,,). The relative influence of 
these variables was examined with a com- 
monality analysis (Kerlinger & Pedhazur, 
1973). The results, shown in Table 2, in- 
dicate that the variables commonly ac- 
counted for a significant portion of the 
variance of long-term storage. Moreover, 
each variable uniquely accounted for an 
additional significant portion of long-term 
storage variance. The portion unique to 
digit-naming speed, however, was much 
larger than the portion unique to scanning. 


CONCLUSIONS 


Data indicated the following: (a) Dys- 
lexics were slower than normals and 
younger boys were slower than older boys 
on naming speed tests; (b) these differences 
were larger for digits (a verbal stimulus) 
than for colors and pictures (concrete stim- 
uli) ; (c) subjects in both groups named 
digits faster than colors and pictures; (d) 
almost all of the normals, but barely half of 
the dyslexies, employed left-to-right visual 
Scanning during the probe task; (e) there 
were primacy effects in the serial-position 
curves of normals and dyslexic scanners but 
not of dyslexic nonscanners; (f) speed of 
naming colors and pictures discriminated 
between dyslexic scanners and nonscanners, 
with lower speeds for nonscanners; (g) 
probability of correct recall was greater for 
normals than dyslexics (scanners and non- 
Scanners) in all but the most recent two 
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TABLE 2 
PROPORTION OF VARIANCE OF LONG-TERM STORAGE 
AssociATED WITH DiGrr-NAMING SPEED 
AND ScANNING 


1. Digit- 
Sours 2, 
: WA | saaiing | 7 
Part common to 1 .26 .26 26.8** 
and 2 
Part unique to 1 .28 30.0** 
Part unique to 2 .06 6.4* 
Total .53 .31 
* df = 1/44. 
* p < .025. 
** p < .001. 


serial positions; (h) almost all (91%) of the 
true variance of memory performance in 
Serial Positions 1 through 6 was predicted 
by speed of digit naming and employment 
of forward seanning; and (7) digit naming 
was a more powerful predictor of memory 
performance than forward scanning. 

These data are interpreted as follows. The 
interaction of reading ability with serial 
position indicates that short-term storage is 
equally robust for dyslexics and normals 
but that long-term storage is impaired for 
dyslexics. As hypothesized, impaired long- 
term storage is strongly related to slow 
speech-motor encoding, presumably because 
slow encoding preempts time that would 
otherwise be available for rehearsal. It ap- 
pears that slow encoding acts in two ways. 
It may merely limit rehearsal or block it 
completely. When digits are used in the 
memory task, the time available for re- 
hearsal is limited by the subject’s encoding 
speed for digits. Whether rehearsal will even 
be attempted, however, is predicted best by 
speed of naming concrete stimuli such as 
colors and pictures, perhaps because re- 
hearsal is a habit that is not acquired until 
children are able to encode any type of 
labeled stimulus at a speed in excess of à 
threshold value. The present data indicate a 
threshold value of about one item per sec- 
ond. 

This analysis probably does not apply 
equally well to all poor readers. Poor 
readers with IQs above 90 were selected for 
the present study. There are many more 
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school children with IQs below 90 who also 
have serious reading problems. Unlike the 
children studied in the present investigation, 
however, many low-IQ children perform in 
the normal range on simple learning tasks 
such as serial memory, although they are 
less skilled on more complex tasks requiring 
abstract reasoning (Jensen, 1973). Their 
low IQs apparently reflect the fact that in- 
telligence tests are heavily loaded with 
items requiring abstract reasoning. Dyslexic 
children, on the other hand, have WISC 
subtest scores indicating impaired serial 
memory but normal performance on tasks 
requiring abstract reasoning. The conclu- 
sions and interpretations presented in this 
report are limited to the latter group. 
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PRESENTATIONS OF INSTRUCTIONAL OBJECTIVES! 
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The purpose of the present study was to determine the effects of 
part versus whole presentations of objectives and text upon intentional 
and incidental prose learning. These effects were examined for three 
passage lengths with specific or general objectives and with two 
densities of objectives. The results showed that presentation of objec- 
tives generally facilitated intentional learning with either increases or 
no reduction in incidental learning. The part presentation resulted in 
greater likelihood of learning intentional items than the whole pres- 
entation with no significant loss of incidental learning. Neither 
specificity nor density of objectives was found to influence per- 
formance. More inspection time was used for part presentations, longer 


passages, specific objectives and larger densities. 


Providing instructional objectives to stu- 
dents prior to text as directions to learn has 
been shown to enhance both intentional 
(objective-relevant) and incidental (non- 
objective-relevant) learning (Rothkopf & 
Kaplan, 1972). Further, Kaplan and Roth- 
kopf (1974) found that when objectives 
were inspected as a whole, the likelihood 
of learning objective-relevant material was 
greater with (a) shorter passages, (b) fewer 
objectives, and (c) more specifically phrased 
objectives. However, instructional materials 
are seldom only a few pages in length and 
Contain a small proportion of objective- 
relevant material. Further, the desirability 
of short passages and few objectives con- 
fliets with the desirability of specifically 
phrased objectives. That is, a greater num- 
ber of specific objectives is required to 
cover the same instructional points than 
the more generally phrased objectives. 
Therefore, the type of instructional material 
that students are usually required to learn 
1s at variance with the best use of objec- 
tives suggested in the Kaplan and Roth- 
— MÀ 
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kopf (1974) study. The problem is one of 
finding a way to provide objectives such 
that the likelihood of learning remains high 
for large texts containing a large quantity of 
objective-relevant information. One tech- 
nique that could resolve this problem is to 
distribute partial lists of objectives among 
corresponding text segments. This part pres- 
entation method would permit the learner to 
consider short text segments with relatively 
few specifically phrased objectives. There 
is some evidence indicating that this general 
procedure can enhance learning when ques- 
tions are distributed throughout text seg- 
ments (Frase, 1967; Rothkopf & Bisbicos, 
1967). However, these studies did not in- 
vestigate the whole presentation of questions 
or objectives (all objectives presented prior 
to the complete text) as in the Kaplan and 
Rothkopf (1974) study. Papay (1971) did 
investigate part versus whole presentations 
of objectives and text. He reported “incon- 
clusive” results for intentional learning with 
respect to part versus whole presentations, 
and no comparisons were made for non- 
objective-relevant learning. In addition, 
Papay’s finding was based on 28 objectives 
for a 3,500-word text. The Kaplan and 
Rothkopf (1974) findings for whole pres- 
entation were based on a larger proportion 
of objectives to text sentences (e.g., from 22, 
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34, 41, or 48 objectives for a 56-sentence 
text to 68, 104, 124, or 146 objectives for a 
169-sentence text) . 

The purpose of the present study was to 
determine if performance increments found 
with whole presentations of objectives prior 
to text (Kaplan & Rothkopf, 1974) could 
be maintained when objectives were dis- 
tributed among text segments. In addition, 
the effects upon learning of passage length, 
specificity of stating objectives, and density 
(proportion of objective-relevant text sen- 
tences to total text sentences) were investi- 
gated with part and whole presentations of 
objectives and text. 


METHOD 


Materials 


Passages. Three basic passages used in previous 
studies (Kaplan & Rothkopf, 1974; Rothkopf & 
Kaplan, 1972) were used in the present study? The 
passages were 60, 54, and 55 sentences in length 
(X = 56 sentences). These individually presented 
passages were designated Passages 1, 2, and 3, and 
they comprised the Length 56 condition. The 
Length 56 passages were then paired to form a 
Length 113 condition. Three pairs of Length 56 
passage combinations (2,1; 2,3; and 3,1), with a 
mean of 113 sentences, were used. Finally, a 
Length 169 condition was developed by combin- 
ing all three Length 56 passages in two different 
sequential orders (passage combinations 2,1,3 and 
2,3,1). The particular passage combination 
sequences used for Length 113 and 169 were 
NA im ee flow of information, 

Jectwes. The objectives were the 
those used. in the two studies Diei Hae 
tioned, Objectives were defined as directions to 
the subject to learn specific instructional points. 
Two types of objectives were prepared: specific 
and general, A specific objective was one that 
required the subject to learn one passage sentence., 
Thus, the number of specific objectives equaled 
the number of objective-relevant passage sen- 
tences. The match between objectives and passage 
sentences was empirically determined in a E 
experimental study (see Rothkopf & Kanka 
1972). Sets of 2-5 specific objectives were written 
for adjacent passage sentences. This permitted a 
single, generally phrased objective to be written 
for the same 2-5 adjacent sentences. Thus, a given 
set of 2-5 passage sentences was associated with 
either 2-5 Specific objectives or with 1 general 
objective. The total number of Specific objectives 


* Appreciation is extended to M; 

r. F. L. Steven- 
son for the use of materials from Systems Train. 
ing Department courses at Bell Laboratories. 
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written for each of the three Length 56 passag 
was 36, 33, and 34 objectives. Combinations 
these objectives were used for Lengths 113 anç 
169 such that the mean number of specific obj 
tives per passage length was 34, 69, and 103, respec- 
tively. Similarly, the mean number of gene 
objectives written for each of the Length 
passages was 14, 12, and 8. The mean number 
general objectives for Passage Lengths 56, 113, ant 
169 was 11, 23, and 34, respectively. The objective 
always appeared in the same sequential order 
the passage sentences. 

Density was defined as the proportion of objec 
tive-relevant sentences to the total number of 
passage sentences. The use of all objectives (i.e, 
103 specific or 34 general objectives) for Passage 
Length 169 resulted in a 60% density. That is, 108 
objective-relevant sentences out of 169 passage 
sentences resulted in 60% of the passage sentences. 
being objective-relevant. Subsets of these ob- 
jectives were selected for Length 56 and 113 in 
order to maintain a 60% density across passage 
length. Table 1 shows the number of objectives 
and corresponding objective-relevant sentences 
used for Density 60%. A second density (40%) was 
achieved by selecting subsets of objectives from 
Density 60% such that 40% of the passage sen- 
tences were objective-relevant across all passage 
lengths (Table 1). Both density levels were con- 
structed by selecting subsets of specific and match- 
ing general objectives that were relevant to the 
Same passage sentences. Those sentences remain- 
ing, which were not relevant to any objective, were 
used to measure incidental learning. However, only 
those sentences which were identical in densities 
40% and 60% (common sentences) were analyzed. | 
This procedure permitted measurement of in- 
tentional and incidental learning with the same 
test items for every experimental condition. 

All of the objectives for a given treatment were 
prepared in a list that preceded the entire passage 
This whole presentation treatment. permitted sub- 
Jects to inspect all Objectives and the passage 
concurrently. The objectives’ lists were then 
divided into one third segments for part pres- 
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TABLE 1 
Mean NUMBER or OBJECTIVES AND PASSAGE 
SENTENCES PER TREATMENT 


Density 40 97, Density 60% 
f 
H ae zi 
length. 38 8 
(total » : $ $ ^ : GE 
sentences) E es | os DE E i $ A 

HERE: 

E | Fe | 58 | ge | 8° |3 

a o a ae 

56 8 | 22 | 22 | n | 34 | 3 
113 5 | 45 | 45 | 23 | 69 | 09 
169 67 | 67 | 34 | 103 | 103 
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entation. Similarly, the passages were divided into 
one third segments corresponding to the objec- 
tive segments. This procedure permitted inspec- 
tion of the passage and corresponding objective 
segments. Both part and whole presentations were 
prepared for all experimental treatments. 

Tests. The tests consisted of prompted recall, 
fill-in-the-blank type items. A test question was 
written for almost every passage sentence. Per- 
formance on objective-relevant test questions was 
considered a measure of intentional learning. Per- 
formance on nonobjective-relevant test questions 
was considered a measure of incidental learning. 
The total number of questions written for each 
Length 56 passage was 56, 52, 51. Combinations of 
these Length 56 test questions were used for 
Lengths 113 and 169. Three separate random, orders 
of test questions were prepared for each passage 
length. 


Analyses 


Three 2x 3x 2x2 analyses of variance were 
performed with two levels of presentation (part 
and whole), three levels of passage length (56, 113, 
and 169 sentences), two levels of objective speci- 
ficity (specific and general), and two levels of 
density (40% and 60%). Separate analyses were 
performed for intentional learning, incidental learn- 
Ing, and inspection time, Eighteen subjects were 
assigned to each of 24 treatments (N = 432). In 
addition, 108 subjects served in six reference groups 
who read the passages without objectives (18 
subjects with part or whole texts per passage 
length), 


Procedure 


Approximately 100 subjects participated in each 
experimental session. The experimental sessions 
Were conducted in each high school cafeteria after 
the last school period and lasted for about one 
and a half hours. The materials were packaged 
in manila envelopes with the contents of each en- 
velope being inspected individually. The part treat- 
ment groups received five envelopes, each of which 
contained instructions and (a) the first one third 
of objectives and passage, (b) the second one 
third of objectives and passage, (c) the last one 
third of objectives and passage, (d) a test, and 
(e) supplementary material to occupy subjects 
Who completed the experimental task early. The 
whole-treatment groups received three envelopes 
consisting of (a) a complete list of objectives and 
a Complete passage, (b) a test, and (c) supple- 
mentary material. Half of the reference groups 
wooed the same three envelopes as the whole- 
emnt groups, except that there were no ob- 
MC] in Envelope 1. The second half of the 
eference groups received the same five envelopes 
Eur part-treatment groups, except that these en- 
My Opes contained no objectives. The experimental 
2 Jects were told that they would be tested only 
n the objective-relevant material. However, they 
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were tested on almost every passage sentence. This 
procedure permitted measurement of both inten- 
tional (objective-relevant) and incidental (non- 
objective-relevant) learning. All subjects were 
permitted to read at their own pace and were 
instructed to record their start and stop inspec- 
tion times. Two digital clocks with 2 X 5 inch 
numerals were provided for this purpose. 

Subjects. The subjects were 540 paid volunteers 
(181 males and 359 females between 15 and 18 
years of age) from six New Jersey high schools. 


RzsuLTS 


Intentional Learning 


Figure 1 summarizes the means from 
several analyses. The data from the inten- 
tional learning analysis, for test items that 
were common to all treatments, is sum- 
marized in the top pair of curves in Figure 
l. Are sine transformations were used on 
proportion scores. 

Part presentation (X — .39) resulted in 
greater learning than whole presentation 
(X = 35; F = 449, df = 1/408, p < .05). 
The passage length main effect was highly 
significant (F = 23.04, df = 2/408, p < 
001). Paired comparisons using the New- 


PASSAGE LENGTH (Sentences) 


Ficure 1. Mean proportion of correct responses 
for intentional and incidental test items, which 
were common to all treatments, as a function of 
passage length, and part versus whole presenta- 
tions. (Nonobjective reference groups’ proportion 
scores are also shown as a function of passage 
length and mode of presentation.) 
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man-Keuls technique showed that Length 
56 (X — .46) resulted in greater learning 
than Length 113 (X = .34; q = 721; df = 
408, r = 2, p < .01) and Length 169 (X = 
31; q = 8.89, df = 408, r = 2, p < 01). 
Length 113 was not significantly different 
from Length 169. This finding held for both 
the part and whole presentations when 
analyzed separately. Neither specificity nor 
density had a significant effect. 

Comparisons between the treatment 
groups and reference groups were made with 
t tests. The treatment groups’ intentional 
learning was greater than the reference 
groups’ learning for all three passage 
lengths, for both part and whole presenta- 
tions. 


Incidental Learning 


The middle pair of curves in Figure 1 
shows the main results for the analysis of 
incidental learning. 

There was no significant difference for 
incidental learning between part and whole 
presentations. The passage length main ef- 
fect was significant (F = 20.27, df = 
2/408, p « .001). Similar to intentional 
learning, paired comparisons showed that 


ANTILOG OF MEAN LOG INSPECTION TIME (Minutes) 


PASSAGE LENGTH (Sentences) 


Figure 2. Anti-log of mean log inspection time 
for part versus whole presentations for (TREAT.) 
treatment and rer. (reference) groups as a func- 
tion of passage length. 
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was significantly greater than the reference 
groups’ incidental learning with part presen- 
tations for Length 56 only (t = 3.79, df = 
88, p < .01). No significant difference was 
found for Lengths 113 or 169. However, the 
treatment groups' incidental learning was 
significantly greater than the reference 
groups' incidental learning with whole pres- 
entations for Lengths 113 and 169 (t = 2.87, 
df = 88, p < .01, and t = 3.00, df = 88, p € 
01, respectively). No significant difference 
was found for Length 56. 


Inspection Time 


Some of the time data were unusable due 
to subject errors in recording times. There- 
fore, data for only 450 subjects were ana- 
lyzed (treatment groups: n = 360; refer- 
ence groups: n = 90). Log transformations 
were performed on inspection time. Figure 
2 shows the main results of this analysis. 

Passages and objectives with part pres- 
entations (X = 25.76) required more read- 
ing time than with whole presentations 
(X = 21.58; F = 6.71, df = 1/336, p < .01). 
Shorter passages required less reading time 
than longer passages (Length 56, X = 17.48; 
Length 113, X = 23.54; Length 169, X = 
29.99; F = 4.10, df = 2/336, p < .001). Gen- 
eral objectives (X = 21.55) required less 
time than specific objectives (X = 25.78; 
F = 13.85, df = 1/336, p < .001). Density 
40% (X = 21.93) required less time than 
Density 60% (X = 25.45; F = 8.47, df = 
1/336, P < 01). None of the interactions 
was significant. 

Inspection time comparisons between 
each reference group and the corresponding 
treatment group were made with t tests. 
The only significant difference in the part 
presentation occurred at Length 169 where 
the treatment group (X — 33.21) required 
more time than the reference group (X = 


performance with Length 56 (X — .36) was 
greater than with Length 113 (X = .28; 
q = 6.53, df = 408, r = 2, p < .01) and 
Length 169 (X = .26; q = 8.47, df = 408, 
r = 3, p < .01). Length 113 was not signifi- 
cantly different from Length 169. Neither 
specificity nor density had a significant 
effect. 

The treatment groups’ incidental learning, 
22.65; t = 231, df = 33, p « .05). The | 
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treatment group (X — 16.63) also required 
more time than the reference group (X — 
8.07) with the whole presentation at Length 
56 (t = 2.59, df = 73, p < .01). In addition, 
the treatment group (X — 26.77) required 
more time than the reference group (X — 
24.11) with the whole presentation at Length 
169 (t = 2.76, df = 73, p < .01). 


Discussion 


The primary finding of this study was 
that performance increments found with 
whole presentations of instructional objec- 
tives could be achieved with part presenta- 
tions of objectives and text. In fact, the part 
presentations resulted in even greater in- 
tentional learning than whole presentations. 

As expected, intentional learning with 
whole presentations was greater than the 
reference groups’ performance on the same 
test items for every passage length. This 
finding is consistent with the results of the 
Kaplan and Rothkopf (1974) study. Simi- 
larly, the greater intentional learning found 
with part presentations over the reference 
groups’ performance at every passage length 
is consistent with the findings of Frase 
(1967) and Rothkopf and Bisbicos (1967). 
These studies found intentional learning to 
be greater than a nonobjective control group 
when adjunct questions were distributed 
throughout a text. Incidental learning with 
whole presentations was found to be greater 
than the reference groups’ performance at 
Passage Lengths 113 and 169 but not at 
Passage Length 56. This finding also repli- 
cates Kaplan and Rothkopf (1974). Con- 
versely, incidental learning with part pres- 
entations was found to be greater than the 
Teference groups’ performance only at 
Length 56. This was the result of a reversal 
in the reference groups’ performance with 
part and whole presentations (Figure 1). 
That is, the reference groups’ performance 
Was greater with whole presentations at 
Length 56 and with part presentations at 
Lengths 113 and 169. This was predictable 
in that learning has been shown to be greater 
With whole presentations of short contex- 
tually connected passages and with part 
Presentations of longer, less connected pas- 
Sages (Ausubel, 1963). Conversely, no part/ 
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whole effect was found for the treatment 
groups' incidental learning. Both the treat- 
ment groups' performance on incidental 
items and the reference groups' total per- 
formance are based upon nonobjective- 
relevant text sentences. However, the refer- 
ence groups' performance is considered to be 
intentional learning in that they were in- 
structed to learn everything in the passage. 
"Thus, the reference groups read the passage 
nonselectively, giving equal attention to 
each sentence. The treatment groups, on the 
other hand, read the text selectively giving 
less attention to incidental sentences. 

Both intentional and incidental learn- 
ing for part and whole presentations de- 
creased as a function of passage length. This 
finding, with respect to whole presentation, 
is again consistent with the Kaplan and 
Rothkopf (1974) study. Similar to that 
study and to Rothkopf & Kaplan (1972) 
was the finding that differences in specificity 
of stating objectives and in density were in 
the same direction (ie., specific > general 
objectives and Density 40% > Density 
60%), although the differences were not 
significant. 

The inspection time analysis shows that 
more time is used for part presentations, 
longer passages, specific objectives, and 
larger proportions of objectives to text. In 
addition, the treatment groups generally 
used more time than the reference groups 
with whole presentations (Lengths 56 and 
169). However, the treatment groups used 
more time than the reference groups with 
part presentations only at Length 169. The 
additional time generally used by treatment 
groups must be considered with respect to 
learning. When mastery of instructional 
material is desired, the use of objectives is 
shown to be beneficial even though more in- 
spection time is needed. Smaller segments of 
partial presentations (Lengths 56 and 113) 
would seem to maximize the likelihood of 
mastering any given objective while mini- 
mizing inspection time differences. 
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The purpose of the study was to investigate the relationship between 
cognitive style and school learning of fifth-grade children. The find- 
ings indicate that cognitive style was differentially related to school 
learning for both boys and girls after variance attributed to verbal and 
nonverbal IQ was taken into consideration in the data analysis. The 
results suggest that the relationship between a particular style and a 
particular school-learning variable may be an important consideration 
prior to assigning children to differential instructional treatments or to 
instructing children in the use of a particular style. 


The possible effects of individual differ- 
ences in cognitive style on school learning 
has been speculated upon since the notion 
of cognitive style was introduced by Gard- 
ner 20 years ago (Gardner, 1953; Glaser, 
1972; Kagan, Moss, & Sigel, 1963; Messick, 
1970; Nunney & Hill, 1972; Wallach & 
Kogan, 1965; Witkin, Dyk, Faterson, Good- 
enough, & Karp, 1962). Empirical evidence, 
however, has led to few conclusions concern- 
ing the relationships among cognitive style, 
learning conditions, and learning outcomes 
(Coop & Sigel, 1971; Cronbach, 1968; 
Kogan, 1971). 

, Although the relationship between cogni- 
tive style and school learning may be 
masked because individual differences in 
style are not commonly matched with learn- 
Ing conditions, a more basic problem may 
lie in style tests themselves. Among the 
Problems most often cited are scoring based 
9n ipsative formats and the irrelevance of 
performance on style tests in relation to 
School learning tasks (Annesley, 1971; Bro- 
D Hall, & Watson, 1972; Davis, 1971; 
Denmark, Havlena, & Murgatroyd, 1971; 
Tow d 1971; Gatewood, 1971; Huckabee, 

69; Scott, 1971; Wallach & Kogan, 1965). 
; he purpose of the present study was to 
Mvestigate further the relationship between 
Cognitive style and school learning of fifth- 
po——— 


1 
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grade children. A test was designed for this 
study which featured independent scales for 
each cognitive style and tasks similar to 
school-learning tasks. Performance on a 
standardized achievement test was used as 
a measure of school learning. Since past 
studies have suggested that interactions 
among cognitive style, intellectual ability, 
and sex may mask the relationship between 
cognitive style and school learning, intel- 
lectual ability and sex were included as ad- 
ditional variables for investigation. Two 
general questions were used to guide the 
data analysis. First, what is the relationship 
between cognitive style and school learn- 
ing? Second, what additional contribution 
do measures of cognitive style add beyond 
that obtained from a traditional test of in- 
telligence in the prediction of school learn- 
ing? 
The cognitive style dimension under study 
was identified by Kagan et al. (1963). 
Kagan et al. defined cognitive style as 
“stable individual preferences in modes of 
perceptual organization and conceptual cat- 
egorization of the external environment [p. 
74]." These investigators subsequently iden- 
tified three cognitive styles among children: 
categorical, descriptive, and relational. Cat- 
egorical responses refer to the use of com- 
mon class membership in relating stimuli 
(e.g., a dog and a sheep are both animals). 
This style has been referred to as an in- 
ferential mode of conceptualization in 
that the use of abstract labels is a means 
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of summarizing the detailed relationship 
among stimuli. Descriptive responses may 
be defined as concepts formed on the basis 
of shared physical attributes of stimuli 
(e.g, a dog and a sheep both have four 
legs). This style has been called an analyti- 
cal mode of conceptualization in the sense 
that an individual deals with similarities 
among the concrete detail of stimuli. Rela- 
tional responses are those in which func- 
tional relationships among stimuli are used 
in associating stimuli (e.g., a dog is used to 
drive sheep). The relational style has been 
referred to as a global, contextual, themati- 
cal mode in that an individual associates a 
whole stimulus with another whole stimulus 
in making an interdependent functional re- 
lationship rather than forming a concept. 
The definitions of each of these styles, char- 
acteristics of items on tests, and scoring 
criteria for classifying items from earlier 
investigations were used as the basis for 
designing the instrument of the present 
study (Achenbach, 1970; Brozovich et al., 
1972; Kagan et al., 1963; Wallach and 
Kogan, 1965). 


METHOD 
Subjects 


Two-hundred and fifty-eight fifth grade 
children (132 boys, 126 girls) from 12 class- 
rooms in five schools in a midwestern city 
served as subjects, 


Tests 


The Iowa Test of Basic Skills (ITBS) 
and the Lorge-Thorndike Intelligence Test 
were administered as a regular part of the 
school district’s evaluation program. Verbal 
and nonverbal deviation IQ scores for the 
Lorge-Thorndike Intelligence Test and 
grade-equivalent scores for each subtest of 
the ITBS were subsequently obtained by 
the experimenters. 

The cognitive style test consisted of verbal 
analogy items.? Each cognitive style scale 


ib The following are examples of each type of 
item (* indicates the correct alternative): 
Categorical: 
Dog is to cat as chicken is to 
a. feather b. eggs 


*c. pig d. bark 
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(categorical, descriptive, and relational) in 
cluded 14 items. The cognitive style test was 
administered orally by the subjects’ class- 
room teachers and scored by the experi- 
menters. 

Verbal content was chosen for the cogni- 
tive style test since most school-related tasks 
are predominantly verbal. An analogy for- 
mat was chosen for the test as this type of 
format provides a means of assessing cogni- 
tive style as a process variable. That is, a 
subject must form a relationship between 
the first two terms and then generalize the 
relationship between the first two terms to 
the last two terms of the analogy. A sepa- 
rate scale was constructed for each style in 
order to derive independent scores for each 
subject. Most instruments used in this re- 
search area have consisted of pictorial con- 
tent, items involving a two-term association, 
and ipsative or intraindividual dependent 
scores for subjects. 

A basic assumption underlying most re- 
search has been that cognitive style is an 
individual’s manner rather than one’s level 
of intellectual functioning (Kogan, 1971). 
Consequently, precautions were taken to 
select words for expressing analogous rela- 
tionships which were familiar to fifth-grade 


20) for the scales were categorical (C) 84; 
descriptive (D) .80; and relational (R) .78 
for the 258 subjects. Thus, subjects per- 
formed fairly consistently on each of the 14 
item scales. 


Resutts AND DISCUSSION 


Means and standard deviations of style, 
IQ, and school-learning variables are T€ 
ported in Table 1 for boys, girls, and the 
total sample. Cognitive style scores 37^ 
given in raw score form, verbal and non- 
verbal IQ scores in deviation IQ units, an 
school learning scores in grade-equivalen 


Descriptive: 
Chair is to legs as lamp is to — ————' 
a. furniture b. light c. hand 
*d. light bulb 
Relational: 
Key is to lock as saw is to _____— 
a. keys *b. board c. tool d. teeth 


subjects. 
Reliabilities (Kuder-Richardson Formula 
4 
j 
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TABLE 1 
Descriptive STATISTICS 
Subjects 
Variable Total (n = 258) Boys (n — 132) Girls (n — 126) 

x SD x SD x SD 
Categorical style 9.8 3.6 9.5 3.7 10.2 3.5 
Descriptive style 9.2 3.4 8.7 3.4 9.8 3.3 
Relational style 9.3 3.2 9.0 3.3 9.7 3.1 
Verbal IQ 99.4 14.1 97.0 14.2 102.0 13.6 
Nonverbal IQ 107.3 16.7 105.8 16.6 108.9 16.6 
Vocabulary 4.5 1.5 4.2 1.6 4.9 1.5 
Reading Comprehension 4.4 1.4 4.1 1.3 4.8 1.4 
Spelling 4.4 1.6 4.0 1.4 4.9 1.7 
Capitalization 4.7 1.7 4.4 1.4 5.0 1.8 
Punctuation 4.5 LT 4.0 1.4 5.0 1.8 
Language Usage 4.3 1.6 4.1 1.5 4.6 Tom 
Map Reading 4.5 1.3 4.5 1.3 4.6 1.3 
Graphs and Tables 4.6 1.4 4.5 1.4 4.6 1.5 
Reference Materials 4.5 1.4 4.2 1.2 4.8 1.5 
Mathematical Concepts 4.6 1.2 4.5 1.1 4.6 1.2 
Mathematical Problems 4.5 1.2 4.4 1.2 4.7 1.3 


units, An inspection of the means and stan- 
dard deviations indieates that they were 
fairly uniform across sex for all variables. 
While girls obtained slightly higher means 
on all variables, the standard deviations 
were approximately the same for each sex. 
.Table 2 presents the correlation coeffi- 
cients among style, IQ, and school-learning 
variables for the total sample. When the 
Same set of correlations were calculated 
Separately for each sex, there were only two 
Significantly different correlation coeffi- 
cients. The correlations between reading 
comprehension and language usage and be- 
tween descriptive style and vocabulary were 
Significantly higher (p < .05) for girls than 
boys. 

A series of multiple regression analyses 
was performed to study the unique contri- 
bution of each cognitive style measure in the 
Prediction of school learning. The results of 
this series of analyses are reported in Table 
3 for each sex. Six of the ITBS subtests 
(Vocabulary, Reading Comprehension, Cap- 
Italization, Map Reading, Graphs and Ta- 
bles, and Mathematical Concepts) involved 
Multiple cognitive style orientations based 
9n significant style contributions across sex. 

€ five remaining subtests were less com- 
plex in cognitive style requirements. 


Descriptive style contributed significantly 
in 19 of the 22 analyses for both sexes. A 
reversal occurred, however, for categorical 
and relational styles. For boys, relational 
style accounted for 7 of 10 additional sig- 
nificant style contributions. Seven of the 9 
significant style contributions for girls were 
for categorical style. 

The only school-learning variable for 
which a style other than descriptive entered 
the regression equation first for both sexes 
was mathematical concepts. Categorical 
style was the first style to enter the equa- 
tion for this subtest. In the other two 
instances (Reading Comprehension and 
Graphs and Tables) in which descriptive 
style did not enter the regression equation 
first, relational style entered first for boys. 
These were the only two subtests in which 
relational style made a significant contri- 
bution in relation to school learning for girls. 

An additional series of multiple regres- 
sion analyses was performed in order to 
investigate the unique contribution of cogni- 
tive style in the prediction of school learn- 
ing after the effects of verbal and nonverbal 
IQ were removed from the school-learning 
variables. Fifteen independent variables, in- 
cluding 10 interaction terms (e.g., Descrip- 
tive X Verbal IQ), were used in this series of 
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TABLE 3 


Resuuts or MULTIPLE REGRESSION ANALYSES OF COGNITIVE STYLES IN RELATION TO 
SCHOOL LEARNING 


Proportion of Variance 
Variable Se Removed by — EY style Attributed to second rode variable 
Style r n Style VERS in F 
Vocabulary boy D .44 .19 R .03 4.92* 
girl D .63 -40 Cc -04 8.04** 
Reading Comprehension boy R 51 .26 c .03 5.60* 
girl D .58 -34 R .02 4.28* 
Spelling boy D .52 .27 
girl D .59 -38 [o] .03 6.27* 
Capitalization boy D .56 .31 R .02 4.52* 
girl D 48 23 [o] .03 4.61* 
Punctuation boy D .50 .25 R .02 4.31* 
girl D .57 .33 
Language Usage boy D .51 .26 
girl D .59 .94 
Map Reading boy D .58 .94 R .05 9.34** 
girl D 48 23 Cc .03 5.57* 
Graphs and Tables boy R .50 .25 Cc .04 7.92** 
girl D 45 .20 R .03 4.16* 
Reference Materials boy D .40 .16 
girl D .58 .94 
Mathematical Concepts boy Cc .54 29 R .05 10.00** 
girl [o .58 34 D .03 5.70* 
Mathematical Problems boy D .55 .30 
girl D 54 .29 [o] .03 6.24* 


Note. Abbreviations: D = descriptive, R = relational, and C = categorical. 
* In no instance did the entry of a third style measure contribute significantly in the prediction of 


school learning at the .05 level. 
x Significant at .05 level. 
** Significant at .01 level. 


analyses. For each analysis, verbal and non- 
verbal IQ were forced into the equation, 
then the remaining 13 independent variables 
entered freely into the regression equation. 
The results of this series of analyses are 
Presented in Table 4. In order to interpret 
Significant interactions between style and 
1Q, the zero-order correlations between cog- 
nitive style and school learning for different 
Q levels were calculated, The interactions 
between IQ and a particular style were 
relatively consistent across sex and school- 
learning variables. The relationships be- 
j tween relational style and reading compre- 
hension, map reading, and graphs and tables 
Were higher in the upper-third IQ level than 
the lower two-thirds IQ levels for boys. 
EL categorical style, the relationship be- 
ween style and mathematical concepts was 


higher in the upper two-thirds IQ levels 
than the lower-third IQ level for both sexes. 
The relationships between descriptive style 
and the other school-learning measures were 
higher in the upper-third and lower-third 
IQ levels than in the middle-IQ level for 
both sexes. The only exceptions to these 
trends were that the relationships between 
deseriptive style and punctuation and de- 
seriptive style and language usage were 
higher for middle-IQ level than high- and 
low-IQ levels for girls. À comparison of 
Tables 3 and 4 indicates that the relation- 
ships between relational style and school 
learning for boys and between descriptive 
style and school learning for both sexes 
were still present after the effects of verbal 
and nonverbal IQ were partialed out of the 
school-learning measures. The relationship 
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capis V (V) AND 
ER VARIANCE ÁTTRIBUTED TO VERBAL A 
Om Si AN I is Bon REMOVED FROM SCHOOL LEARNING E 
Proportion of variance 
st oor UL Attributed to additional variables 
Variable . ME 
R Rm Variable entered | Increase F 
.43 
lcs be 7" .50 D XVIQ id a P 
Reading Comprehension boy .65 .43 ^ x V IQ (^ vate 
i .03 9. 
girl 75 .56 D x VIQ Te Hs 
VI .02 4.88* 
Spelling boy .68 AT DXN IQ m ioe 
girl A43 54 D X NV IQ i 
Capitalization boy .67 45 D X NV IQ .04 9. 
e &||npxmxv IQ | .02 | 5.2 
Punctuation "is 66 ‘4 D x NV IQ .05 D 
Language Usage boy .56 .92 p x V IQ i^ 5.74, 
i VI .09 | 19.35* 
e£ ^ xd D pna .02 5.028 
v oe 
Map Reading boy up: .52 E x V IQ e rs 
girl 71 -50 á 
Graphs and Tables boy .64 AL B X V IQ ui Dd 
girl .60 .36 
Reference Materials boy .69 AT 
girl 65 42 | Dx NVIQ .06 14.74 
D .02 4.59 
Mathematical Concepts boy .74 54 c x NV IQ [^ 6 P. 
girl mm 4 | CXNVIQ | .04 9.69** 
Cc .04 12.40. 
Mathematical Problems boy .70 .50 D x NV IQ .03 7.28 
girl E 55 


Note. Abbreviations: D 
* Significant at .05 level. 
** Significant at .01 level. 


between categorical style and school learn- 
ing for girls was reduced considerably after 
the variance attributed to verbal and non- 
verbal IQ were removed from school learn- 
ing. 


SuMMARY 


The results of this study Support earlier 
contentions that standardized intellectual 
ability and school achievement tests are 
quite heterogeneous with respect to cogni- 
tive style requirements. The findings indi- 
cate, however, that additional variance was 


= descriptive, R = relational, and C = categorical. 


accounted for by cognitive style measures 
beyond that of verbal and nonverbal IQ in 
relation to school learning, E 
Of the three style measures, descriptive 
style was the most important contributor in 
the prediction of school learning. For boys 
relational style contributed more often in 
the prediction of school learning than 3 
categorical style. A categorical style 8€ 
counted for additional variance in the pre 
diction of school learning more frequently 
for girls than a relational style. The con- 
tribution of categorical style for girls, hoW- 
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ever, was reduced substantially after vari- 
ance attributed to verbal and nonverbal IQ 
was removed from school-learning variables. 

The high intercorrelations among the 
three style scales for the instrument used in 
this study would suggest difficulty in ob- 
taining differential convergent validity for 
cognitive style. Despite these high intercor- 
relations, the differential relationships be- 
tween cognitive style and school learning 
found in this investigation indicate that 
convergent validity can be demonstrated for 
the notion of cognitive style in relation to 
school learning. Whether or not reducing the 
intercorrelations among scales would allow 
for greater clarification of the relationship 
between cognitive style and school learning 
than found in this study is a question which 
warrants further investigation. 

Based on the findings of this study, it 
would seem that before meaningful recom- 
mendations can be made for educational 
programming based on the cognitive style 
mapping of children, the relationship be- 
tween a particular cognitive style and a 
particular school-learning task must be 
taken into consideration. 
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SUGGESTIONS TO CONTRIBUTORS 


Notice to APA AUTHORS 


The American Psychological Association announces publication of the 
second edition of the Publication Manual in Aygust 1974. This new edition, 
which supersedes the 1967 Manual, will be adopted by the 14 APA journals 
in 1975. 

The new Manual is more comprehensive than the previous edition. It up- 
dates APA publication policies and procedures and incorporates changes in 
editorial practice since 1967. For instance, APA now sends many authors their 
copy-edited manuscripts for review before they are set into type, and some 
editors now use blind review procedures. The second edition also includes up- 
to-date statements of the coverage of each APA journal including the Journal 
of Experimental Psychology which will be published in four separate sections 
in 1975. 

The new Publication Manual initiates several changes in APA style. These 
changes are announced in the August 1974 American Psychologist and will be 
introduced in the APA journals in January 1975. During the period of transi- 
tion to the new style, authors should note that (a) all manuscripts published 
in 1974 will be copy-edited according to the 1967 Manual, (b) manuscripts 
accepted in 1974 and published in 1975 will be copy-edited to conform to the 
new Manual. Starting in 1975, accepted manuscripts that depart significantly 
from the Manual will be returned to authors for correction. 

Authors will be encouraged by the changes in the second edition. The new 
APA style simplifies reference forms; eliminates unnecessary underlines, 
brackets, and other devices; supports appropriate use of “I” and "we"; and 
generally clarifies typing requirements. Material is arranged for maximum 
convenience to authors and typists, and all sections are cross-referenced and 
indexed. 

The new Publication Manual is available after August 1 for $3. Send orders 
to APA Publication Sales, 1200 Seventeenth Street, N.W., Washington, D.C. 
20036. Orders of $15 or less must include payment unless they are submitted 
on institutional purchase order forms. 
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LONG-TERM PREDICTIVE VALIDITY OF 
DIVERGENT-THINKING TESTS: 
SOME NEGATIVE EVIDENCE’ 


NATHAN KOGAN’ 
New School for Social Research 
ETHEL PANKOVE 
Public Schools, Montgomery Township, New Jersey 


Studied the relationship of assessments of divergent-thinking perform- 
ance and intellective aptitude at fifth and tenth grades to nonacademic 
attainments assessed with a biographical questionnaire in the tenth 
grade and upon graduation. Stepwise multiple regression analysis based 
on middle-class subjects drawn from two school systems (ns of 46 and 
22 after attrition) indicates that fifth-grade ideational productivity had 
no predictive power in respect to nonacademic attainment at gradua- 
tion, whereas tenth-grade ideational productivity made a marginally 
significant contribution to that criterion in one of the school systems. 
Contrary to expectations, fifth- and tenth-grade intellective-aptitude 
measures account for modest to substantial amounts of the variation in 
nonacademic attainments, Assessment of such attainments is found to 
be fairly stable from the tenth grade to graduation, Examination of 
separate fields of nonacademic attainment indicates diverse relation- 
ships with cognitive assessments, Implications for creativity research 
are considered, 


Creativity research based on the use of 
divergent-thinking tests has followed two 
relatively independent paths. The first of 
these can be subsumed under the general 
heading of construct validation and con- 


cerns the development of a theory to ac- 
count for variation in divergent-thinking 
performance under natural conditions and 
as a consequence of diverse experimental 
treatments. Investigators in this tradition 


* This study was Supported by the National In- 
stitute of Child Health and Hed (races 
under Research Grant 5 P1 HD01762 to Educa- 
tional Testing Service, Princeton, New Jersey. A 
number of individuals have assisted and partici- 
pated in the project since its inception. We are 
especially grateful to Saul Cooperman and Edward 
McKeon of the Montgomery Township, New 
Jersey School System, and to Gary J. Estadt and 
Celeste Rorro of the Lawrence Township, New 
Jersey School System, for facilitating the research, 
to Edward Nystrom and 
rVing as rime: 
to Augusta Gross for coding of M to po 
rietta Gallagher for Supervision of data analysis. 
Thanks are due Norman Frederiksen and William 
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have examined cognitive and personality 
correlates of divergent-thinking perform- 
ànce and have constructed a variety of ex- 
perimental procedures intended to enhance 
the level of ideational output on divergent- 
thinking tasks. Wallach (1970) has re- 
viewed the foregoing areas of research and 
concluded that the ideational productivity 
or fluency component of divergent thinking 
is essentially independent of intelligence 
and reflects a process best characterized as 
“extensiveness of attention deployment.” 

In a more recent publication, Wallach 
(1971) has attacked the construct-valida- 
tion approach to divergent thinking on the 
grounds that the search for correlates 0 
ideational fluency and for methods of en- 
hancing it has proceeded on the assumption 
of a powerful link between that form of 
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divergent thinking and genuine “real- 
world” creativity. Since the latter is ob- 
viously more difficult to assess than is 
ideational fluency, investigators have 
chosen to concentrate on such test-based 
measures with the justification that they 
are experimental analogues of “real-life” 
creative process. Such a stance prejudges 
the issue, of course, and constitutes the core 
of Wallach’s criticism. 

It is possible to maintain, then, that re- 
search on divergent thinking from a con- 
struct-validation perspective can be faulted 
in the absence of evidence for predictive or 
concurrent validation of test performance 
against “real-world” criteria. Such valida- 
tion efforts constitute the second path that 
research on divergent thinking has taken. 
Torrance (e.g., 1972) reports a variety of 
validational evidence for his tests of crea- 
tivity, but other investigators have seriously 
questioned the positive character of that 
evidence (e.g., Crockenberg, 1972; Harvey, 
Hoffmeister, Coates, & White, 1970). Fur- 
ther, Wallach (1970) has documented the 
lack of independence between the Torrance 
Tests and IQ indices. Cropley’s (1972) five- 
year longitudinal study of the predictive 
validity of divergent-thinking tests is based 
in part on the Torrance procedures, hence, 
further complicating interpretation of the 
ambiguous outcomes of that study. 

The Wallach and Kogan (1965) diver- 
gent-thinking tasks have been shown to 
manifest independence of IQ in a wide range 
of studies, but the examination of the con- 
current and predictive validity of these 
tasks has been quite limited. Wallach and 
Wing (1969) obtained concurrent validity 
for the Wallach-Kogan procedures in a 
Sample of recently graduated high school 
students. Extracurricular activities and ac- 
Complishments comprised the criteria, and 
these were not predicted-by the verbal and 
mathematical scores of the College Board’s 
Scholastic Aptitude Test. The overall ef- 
fects were not especially strong, however, 
and the mode of analysis employed made it 
Impossible to assess the portions of the 
Variance in the criteria attributable to the 
divergent-thinking and intellective indices. 

An initial attempt to explore the pre- 
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dictive validity of the Wallach-Kogan tasks 
is reported by Kogan and Pankove (1972). 
Children whose divergent-thinking per- 
formance and IQ level had been measured 
in the fifth grade were reexamined on 
analogous measures five years later in the 
tenth grade. At the latter time, an assess- 
ment of extracurricular participation was 
also obtained. The findings were mixed, the 
divergent-thinking tasks exhibiting signifi- 
cant concurrent and predictive validity in 
the smaller, but not the larger, school sys- 
tem. Also of interest was the significant 
independent contribution of IQ to the varia- 
tion in the criteria in the smaller school 
system, suggesting that extracurricular ac- 
complishments may be a joint function of 
divergent and convergent modes of thinking. 
The negative results in the larger school 
system, however, raise doubts about the po- 
tential replicability of the findings in the 
smaller school system and prompt consid- 
eration of the possible unreliability of the 
criterion measurement. The tenth grade is 
rather early in the student’s high school 
career, and it is entirely feasible that the 
student’s record of extracurricular activity 
and accomplishment upon graduation from 
high school would represent a more solid 
index of an “intermediate” creativity crite- 
rion. 

In brief, the present paper offers an ex- 
tension of the Kogan and Pankove (1972) 
study. Students whose divergent-thinking 
and intellective-aptitude levels had been 
assessed at fifth and tenth grades were sur- 
veyed at graduation in respect to their ex- 
tracurricular involvements during their 
entire high school career. À rank in class 
index was also available upon graduation. 
The predictive validity (over approxi- 
mately a seven-year and a two-year period) 
of divergent- and convergent-thinking tests 
in respect to "creative" and academic 
achievement is reported in the body of the 


paper. 
METHOD 


The fifth- and tenth-grade administrations of 
the Wallach-Kogan divergent-thinking measures 
are described in full in Pankove and Kogan (1968) 
and Kogan and Pankove (1972), respectively. In- 
formation on intellective-aptitude levels was avail- 
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able in school records. The present research has 
employed the overall IQ derived from the Cali- 
fornia Test of Mental Maturity and the power 
index derived from the Differential Aptitude Tests 
at fifth and tenth grades, respectively. 

The biographical inventory of extracurricular 
activities and accomplishments devised by Wallach 
and Wing (1969), which had been administered in 
the tenth grade, was readministered in the twelfth 
grade just prior to graduation in the smaller school 
System and just after graduation in the larger 
school system. The graduating classes consisted of 
102 and 258 students in the smaller and larger 
School systems, respectively. Both school systems 
can be described as predominantly middle class. 

In the smaller school system, the Wallach-Wing 
questionnaire was personally given to all of the 
graduating seniors in the longitudinal sample by 
the school psychologist, and all of the question- 
naires were returned. Nevertheless, considerable 
attrition occurred between fifth and twelfth grades. 
Out of a total of 46 children (25 males and 21 fe- 
males) assessed in the fifth grade, a grand total of 
22 (12 males and 10 females) graduated from the 
same school system seven years later. 

In the larger school system, it was not possible 
to obtain the cooperation of all of the graduating 
seniors in the longitudinal sample. The Wallach- 
Wing questionnaire was mailed to all of the fore- 
going students with a covering note explaining 
the questionnaire's purpose and offering com- 
pensation of $2 for the return of the questionnaire 
in an enclosed self-addressed stamped envelope. 
A follow-up letter was sent to those students who 
had not returned the questionnaire within three 
weeks of the original mailing date. Of the 69 
students who were mailed questionnaires, 46 (67% ) 
returned them, Hence, of a total of 116 children 
(59 males and 57 females) examined in the fifth 
grade, only 46 (24 males and 22 females) were ac- 

' cessible for study seven years later, 

In the Kogan and Pankove (1972) five-year 
follow-up, considerable attrition had already taken 
place—a 37.5% decline from the original sample 
size, Virtually all of this attrition could be at- 
tributed to family mobility. Those subjects who 
dropped out of the study between the fifth and 
tenth grades did not differ significantly in mean 
divergent-thinking performance or intellective ap- 
titude from those subjects who were in the same 
school systems five years later, 

In the case of the present seven-year follow-up, 
attrition is multiply determined—family mobility, 
dropping out of school prior to graduation, and 
failing to return the questionnaire. While here 
were no significant mean differences in fifth- and 
tenth-grade scores between those individuals who 
left the smaller school System between tenth and 
twelfth grades and those who remained, the same 
cannot be said for the larger school system. In the 
latter, those individuals who dropped out of the 

study between tenth grade and graduation (the 
large majority for failing to return the question- 
naire) had significantly lower mean fifth- and 
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tenth-grade intellective-aptitude scores than th 
who remained in the study (ts of 3.27 and 3. 
p < 01, respectively). The dropped subjects rel; 
tive to those remaining in the study also scori 
lower on fifth-grade ideational productivity (t 
267, p < 01), but no difference was observed jj 
the case of tenth-grade ideational productivi 
(t < 10). 

In the earlier published reports (Kogan 
Pankove, 1972; Pankove & Kogan, 1968), 
Wallach and Kogan divergent-thinking tasks wei 
scored for both productivity (number of responses] 
and uniqueness (number of unique responses, 
These two variables are so highly correlated, how: 
ever, that only the ideational productivity dai 
will be treated in the present report. In the earli 
studies cited, within-task correlations betwee 
productivity and uniqueness ranged from .46 t 
97 with a median correlation of .72. 

On the Wallach-Wing biographical inventory, 
subjects’ activities and accomplishments were 
scored in the areas of leadership, art, social service, 
literature, dramatic arts, music, and science. Each 
area was scored separately, and, in addition, a 
total score across all of the areas was obtained. 
Since the Wallach-Wing items within each area are 
largely cumulative (i.e, arranged from lowest to 
highest levels of accomplishment), weighted scores 
were used. For example, a subject who checked all 
of the items under one of the extracurricular areas 
would receive the same score as the subject who 
checked only that item representing the highest) 
level of accomplishment? 


RESULTS 


Overall Extracurricular Activities and Ac- 
complishments 


Examination of the predictive validity 
of fifth- and tenth-grade productivity and 
intellective-aptitude measures in respect to 
total extracurricular activities and accom- 
plishments over the high school years was 
carried out.by means of stepwise multiple 


*The quantification described does not, of 
course, provide a clear-cut discrimination between 
extracurricular activity, on the one hand, and 
genuine accomplishment, on the other. At best, we 
can merely state that those with high scores on the 
Wallach-Wing scales are higher on a combined ac- 
tivity-attainment index than those with low scores. 
Hence, when the terms “activity,” “attainment,” 
and “accomplishment” are employed in isolation 
in the text and tables, the combined index is always 
implied. The analytic separation of sheer extra- 
curricular activity from genuine nonacademic ac- 
complishment at the secondary school level con- 
stitutes an important problem in its own right but 
one that is beyond the scope of the present mM- 
vestigation. 


DIVERGENT-THINKING TEST VALIDITY 805 
TABLE 1 
SrEPWISE MULTIPLE REGRESSION ANALYSIS FOR PREDICTION OF OVERALL 
HicH SCHOOL ACTIVITIES AND ACCOMPLISHMENTS 
Smaller school system Larger school system 
Predictor 
df R R F dj R R F 
5th grade 
Productivity 20 | —.038| .001 | < 1.00 44 .012 .000 | « 1.00 
Intellective-aptitude 19 .673 | .453 15.67***) 43 .165 .027 | « 1.00 
10th grade 
Productivity 18 -741 | .550 3.88* 42 Bu .029 | « 1.00 
Intellective-aptitude 17 .743 | .551 | < 1.00 41 422 178 7.42940 
Activities 16 -786 | .618 2.79 40 .699 .489 24.99*** 
Bex 15 -786 | .618 | « 1.00 39 139 .545 4.82** 
Fp < .10. 
stp < 06. 
***p < .01. 


regression analysis. Tenth-grade activities 
and accomplishments and subject sex were 
also included as predictors in the analysis. 
Since the major aim of the research con- 
cerned long-term prediction as well as the 
Specific efficacy of divergent-thinking tests, 
fifth-grade independent variables were en- 
tered in the analysis prior to the correspond- 
ing tenth-grade measures, and the diver- 
gent-thinking scores were considered prior 
to the intellective-aptitude measures. The 
outcomes for the smaller and the larger 
school systems are shown in Table 1. 

It can be readily seen that ideational 
productivity assessed in the fifth grade com- 
pletely fails to predict overall extracurricu- 
lar activities and accomplishments assessed 
Seven years later at graduation. This nega- 
tive outcome holds for both school systems. 
In all other respects, the pattern of find- 
Ings differs across the smaller and larger 
Systems. In the former, intellective-aptitude 
level assessed at fifth grade is a powerful 
Predictor of involvement and accomplish- 
ment in extracurricular activities several 
Years later. Approximately 45% of the vari- 
ance in such activities can be accounted 
for on the basis of subjects’ intellective- 
aptitude level assessed in the later elemen- 
tary school years. Only ideational produc- 
tivity measured in the tenth grade con- 
tributes a marginally significant increment 
of predictive power, accounting for an addi- 
tional 10% of the variance in the criterion. 
The negligible contribution of the tenth- 


grade activities index simply reflects the 
fact that fifth-grade intellective-aptitude 
level is almost as strongly correlated with 
it (r = .52) as with the criterion index (r = 
.66). In the Kogan and Pankove (1972) 
report, fifth-grade intellective-aptitude was 
a strong predictor of tenth-grade activities 
in the smaller school system, but both fifth- 
and tenth-grade productivity also ac- 
counted for significant portions of the crite- 
rion variance. The passage of time, then, 
appears to have had the effect of reducing 
the importance of the ideational produc- 
tivity dimension and correspondingly en- 
haneing the importance of intellective 
aptitude assessed as early as elementary 
school. 

In the case of the larger school system, 
Kogan and Pankove (1972) observed that 
the combined fifth- and tenth-grade diver- 
gent-thinking and intellective-aptitude in- 
dices accounted for less than 5% of the 
variance in tenth-grade activities and ac- 
complishments. Approximately two and a 
half years later, the picture has changed to 
some degree. Though fifth-grade measures 
continue to have virtually no predictive 
power, Table 1 indicates that tenth-grade 
intellective-aptitude level now makes a sig- 
nificant contribution, with total accountable 
variance in thé criterion increased to ap- 
proximately 18%. At the same time, tenth- 


. grade activities clearly represent the strong- 


est predictor, accounting for an additional 
30% of criterion variance. In short, prior 
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cognitive assessments are less predictive of 
subsequent talented behavior in the larger 
than in the smaller school system. Earlier 
behavior of a similar nature is evidently the 
best predictor in the larger school system of 
the overall level of extracurricular involve- 
ment and accomplishment upon graduation. 
It should be noted, however, that the pre- 
dictive power of the tenth-grade activities 
index is somewhat inflated by the part- 
whole relation to twelfth-grade activities. 
The latter index reflects total extracurricu- 
lar performance and, hence, necessarily 
incorporates activities assessed two and a 
half years earlier. The significant F value 
for sex in the larger school system indicates 
a somewhat higher level of extracurricular 
activity and accomplishment for females. 


Academic Achievement 


The prediction of rank in class shows 
considerable correspondence between the 
two school systems (Table 2). In both, 
fifth-grade intellective aptitude makes a 
modest contribution to rank in class at 
graduation. Similarly, tenth-grade intel- 
lective aptitude in both school systems has 
considerable predictive power in respect to 
academic achievement, as indexed by rank 
in class, It should also be noted that apart 
from the sex variable in the larger school 
system (females manifest higher academic 
achievement) , the accountable portion of the 
variance in the criterion is largely attrib- 
utable to the cognitive assessments (pri- 
marily intellective aptitude). Tenth-grade 


TABLE 2 
IS FOR PREDICTION or HIGH SCHOOL RANK IN CLASS 


Stepwise MULTIPLE REGRESSION ANALYS 
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activities and accomplishments do not offi 
an additional increment to the criterio 
variance in either school system. This lat 
finding supports the observations of Ho 
land and Richards (1965) regarding thi 
independence of academic and nonacademi 
accomplishment. Those authors also repo 
that intellective-aptitude measures and no 
academic attainments at the high school 
level are independent in concurrent assess: 
ments. This kind of outcome was not o 
tained in the longitudinal assessments of th 
present study. 


Activities and Accomplishments within 
Specific Fields 


The relatively poor predictive power of 
the divergent-thinking measures in respect 
to overall nonacademie attainments may 
possibly be attributable to the unwarranted 
combination of such attainments across 
fields to yield a single total score. Though 
Wallach and Wing (1969) obtained evi- 
dence for concurrent validation using such 
a score, further analysis of their data indi- 
cated that the ideational productivity mea- 
sures discriminated only for the areas of art, 
writing, science, and leadership. No such 
discrimination was obtained in the areas 
of dramatics, music, and social service. It 
is clearly important, then, to determine 
whether consideration of the particular 
areas of nonacademic attainment will fun- 
damentally alter the form of the outcomes 
reported in Table 1. 

Tables 3 and 4 contain the correlations 


Predictor Smaller school system Larger school system 
Panne n F aj | R Rm F 
5th grade 
Productivity 20 083 007 
Intellective-aptit i i < 1.00 44 | —.070| .005 | < 1.00 
es ptitude 19 | .397 | .157 3.39* | 43 4| .172 8.26** 
Productivity 18 .553 306 " 
Intellective-aptitude 1| m5 Teal fram «i re a 254 
Bed S UN is | 27 | .6M | «100 | 40 | 65 ‘300 | < 1.00 
15 [888.702 |< 1.00. | 39.| “707 | 500 8.16** 
pee 
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TABLE 3 


CORRELATIONS FOR THE SMALLER SCHOOL SYSTEM BETWEEN COGNITIVE ASSESSMENTS AND NONACADEMIC 
ATTAINMENTS IN DIFFERENT FIELDS 


Fields of nonacademic attainment 
Predictor 
Leadership} Art | Social | writing | Dramatic | Music | Science | Total 

5th grade 

Productivity —.03 —.02 | .05 -.H 15 .03 —.18 | —,04 

Intellective-aptitude .48* .46* | .35 E rid .28 -38 .44* .06** 
10th grade 

Productivity .28 .27 | .40 27 .39 .48* .01 .49* 

Intellective-aptitude 13 .20 | .13 .21 17 .38 .26 .36 

Activity* .60** .49* | .58** .48* .56** .82** .05 .64** 
Sex? —.18 —.16 |.08 .06 | —.30 —.54* 40 | —.14 
R .74* .63 | .70 .63 71 .83** .65 1798" 


Note. For the smaller school system, n = 22. 


* This row presents correlations between the identical activity in 10th and 12th grades. 
* Positive correlations reflect greater attainment by males, negative correlations by females. 


*p < .05. 
1*5... 01. 


between the cognitive assessments and non- 
academic attainments in the separate fields 
for the smaller and larger school systems, 
respectively. Those tables also report the 
tenth- versus the twelfth-grade correlations 
for each field of nonacademie attainment, 
the correlation between sex and fields of 
nonacademic attainment, the multiple cor- 
relation for each field, and the correlations 
between each predictor and total nonaca- 
demic attainment. 


A scan of the first row of Tables 3 and 4 
quickly reveals the lack of any relationship 
of fifth-grade ideational productivity to 
nonacademic attainment in all of the fields 
surveyed. Hence, the absence of any predic- 
tive power for fifth-grade productivity ob- 
served in Table 1 is evidently not con- 
cealing any particular within-field effects. 
In contrast, fifth-grade intellective-aptitude 
level is significantly related to nonacademic 
attainment in several fields—leadership and 


TABLE 4 
CORRELATIONS FOR THE LARGER SCHOOL SYSTEM BETWEEN COGNITIVE ASSESSMENTS AND NONACADEMIC 


ATTAINMENTS IN DIFFERENT FIELDS 


Fields of nonacademic attainment 


Predictor 


Leadership| Art | Social | writing | Dramatic) Music | Science | Total 

5th grade 

Productivity .07 —.14 .01 .08 .09 | —.03 .00 .01 

Intellective-aptitude .35* | —.24 —.09 .29* | —.03 .16 .12 .16 
10th grade 

Productivity a 20 | —.17 .22 .05 | —.12 | —.28| .01 

Intellective-aptitude .45** .10 —.03 .35* .09 .93* .08 ,89** 

Activitys .A8** .48** .30* .34* .36* .69** | —.12 .63** 
Sext — 30" | —.18 ‘ov | 210 }=.25 |—.28 | =.01 | —.20* 
R Asif .61** .39 .46 47 wot 35 vare 

Note. For the larger school system, n = 46 


* This row presents correlations between t! 
5 Positive correlations reflect greater attainmen 
z$ p < .05. 

P< .0l. 


the identical activity in 10th and 12th grades. 
t by males, negative correlations by females. 
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writing in both school systems as well as art 
and science in the smaller school system. 
At tenth-grade level, there is a trend to- 


ward significant relations between idea- : 


tional productivity and nonacademic at- 
tainment in the smaller school system, but 
only the correlation for music attains sta- 
tistical significance. In the larger school 
system, ideational productivity at tenth 
grade is no more predictive of nonacademic 
attainment than was ideational productiv- 
ity at fifth grade. In regard to tenth-grade 
intellective-aptitude level, the resemblance 
to the corresponding fifth-grade pattern is 
quite strong in the larger school system. For 
the smaller school system, on the other 
hand, tenth-grade intellective aptitude is a 
considerably poorer predictor of nonaca- 
demie attainments than is the corresponding 
fifth-grade index. Indeed, with the one ex- 
ception of science, the ideational productiv- 
ity correlations uniformly exceed the, in- 
tellective-aptitude correlations in the 
tenth-grade assessment. This constitutes the 
sole evidence in the present study for the 
possible efficacy of ideational productivity 
for nonacademic attainments. 


Discussion AND CONCLUSIONS 


On the whole, the results of the present 
longitudinal investigation support those 
authors who have questioned the practice of 
applying the “creativity” label to diver- 
gent-thinking tasks (e.g. Hudson, 1966; 
Nicholls, 1972). There is absolutely no indi- 
cation in the data of the present study that 
divergent thinking assessed in the later 
years of elementary school is prognostic of 
nonacademie attainment during the high 
School years. It is possible to maintain, of 
course, that nonacademic attainment in sec- 
ondary school represents a poor criterion of 
adolescent creativity. That may well be 
so, but it is difficult to envision a more 
desirable or appropriate one. It can also be 
argued that divergent-thinking tests might 
operate in terms of a “sleeper-effect” prin- 
ciple, lacking predictive validity in the high 
school adolescent years for “intermediate” 
criteria but gaining predictive power in 
respect to the “ultimate” criteria of mature 
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adulthood. Again, such an effect may even- 
tually be uncovered, but, until such time, 
the burden of proof clearly falls upon those 
who would equate performance on diver- 
gent-thinking tests with genuine creative 
achievement. 

A lack of predictive validity does not 
necessarily imply the absence of concurrent 
or construct validity. The present investi- 
gation has merely shown that fifth-grade 
divergent-thinking performance is of little 
prognostic value in regard to talented non- 
academic attainment in high school. There 
is good reason to believe that fifth-grade 
children with high scores on ideational pro- 
ductivity differ from those with low scores 
on a variety of cognitive and personality 
measures (Wallach & Kogan, 1965). The 
domain of childhood divergent-thinking 
assessment may be somewhat analogous to 
infant intelligence measurement, where it 
has also been demonstrated that infant test 
scores are highly informative of the child’s 
cognitive status at the time but have little 
predictive validity for IQ at school age or 
later (e.g., Bayley, 1970). 

Divergent-thinking measures obtained in 
tenth grade offered only a slight improve- 
ment over the comparable fifth-grade mea- 
sures in respect to predictive validity. In the 
smaller school system, the former accounted 
for a marginally significant portion of the 
variance in nonacademic attainment at 
graduation. The previously reported (Ko- 
gan & Pankove, 1972) concurrent validity 
for tenth-grade ideational productivity was 
considerably more substantial in that school 
system, pointing to the weak “staying” 
power of such assessments. 

The basis for the school system differ- 
ences constitutes an intriguing unknown in 
the present research. In the absence of a 
thorough ecological analysis of the two 
school systems under study, we are ob- 
viously better equipped to raise questions 
than to provide answers. Given the evi- 
dence that 45% of the variation in non- 
academic attainment at graduation in the 
smaller school system can be accounted for 
by an intelligence assessment made seven 
years earlier, in contrast to a mere 2.7% of 
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the variation in the larger system, one can 
conjecture as to whether the smaller school 
system subtly encourages its brighter 
youngsters to become involved in extracur- 
ricular activities, whereas such involvement 
develops more at the student’s own initia- 
tive in the larger school system. Even at the 
level of tenth grade in the larger school sys- 
tem, intellective aptitude is accounting for 
less than 20% of the variance in nonaca- 
demic attainment. This leaves a great deal 
of room for the operation of noncognitive 
motivational and personality determinants. 
If a smaller school system encourages 
earlier and more extensive involvement in 
extracurricular pursuits, as proposed by 
Barker and Gump (1964) and empirically 
confirmed in the present samples (Kogan & 
Pankove, 1972), one might anticipate that 
the more popular, energetic, and uninhibited 
students would play a dominant role in such 
nonacademie activities. The modest pre- 
dictive utility of tenth-grade ideational pro- 
ductivity hence suggests that the foregoing 
measure may be tapping dimensions of 
energy and responsiveness more than a 
cognitive capacity as such. Cronbach (1968) 
has offered a similar interpretation of idea- 
tional fluency. 

The most consistent data in the present 
longitudinal study (apart from the negligi- 
ble predictive validity of fifth-grade pro- 
ductivity measures) are the substantial as- 
Sociations between tenth- and twelfth-grade 
nonaeademie attainments. Very likely, such 
associations will extend into the college 
years, as Richards, Holland, and Lutz 
(1967) have demonstrated. If future longi- 
tudinal research should demonstrate that 
these “intermediate” criteria are predictive 
of “ultimate” criteria of creative achieve- 
ment in adulthood, we shall be forced to 
question the utility of cognitive testing for 

creativity” in the high school years or even 
earlier (if reliable indices of nonacademic 
attainment are available). Such consistency 
cannot be taken for granted, however. The 
base rates for genuine adult creativity are 
likely to be exceedingly low, assuming that 
a generally acceptable operational defini- 
tion of it can be formulated (see Kogan, 
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1973). Until research has clarified the issue 
of the link between high school or college 
“creativity” and subsequent adult creativ- 
ity, the concern expressed in certain quar- 
ters (e.g, Wing & Wallach, 1971) about 
possible talent loss in the college admissions 
process may well be premature. 
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INFLUENCE OF CHILDREN'S SEX ROLE STANDARDS 
ON READING AND ARITHMETIC ACHIEVEMENT 


CAROL ANNE DWYER* 
Educational Testing Service, Princeton, New Jersey 


The relationship between sex role standards and reading and arithmetic 
achievement was examined. It was hypothesized that children's sex 
role standards, assessed by checklist, would predict their achievement 
test scores. Subjects were 385 middle-class Caucasian children in 
Grades 2, 4, 6, 8, 10, and 12. Multiple regression analyses indicated that 
sex role standards contributed significant variance to reading and 
arithmetic achievement test scores. This effect was stronger for males 
than females. The results suggest that reading and arithmetic sex 
differences are more a function of the child's perception of these areas 
as sex-appropriate or sex-inappropriate than of the child's biological 
sex, individual preference for masculine or feminine sex role, or liking 
or disliking of reading or arithmetic. 


This study examined the relationship be- 
tween sex role standards (the extent to which 
the individual considers certain activities 
appropriate to males or to females) and 
achievement in the areas of reading and 
arithmetic. It was hypothesized that for 
both females and males, sex role standards 
would be a contributing factor to achieve- 
ment and would predict achievement scores 
in reading and arithmetie less well than IQ 
but, better than biological sex or liking or 
disliking the subject. 

The concept of sex role standards was 
originally set forth by Kagan (1964), who 
found that second and third graders con- 
sidered many school-related objects and 
activities feminine. This concept was sub- 
stantially developed by Stein and Smithells 
(1968, 1969) , Stein (1971), and Stein, Pohly, 
and Mueller (1971), who examined age and 
Sex differences in children’s sex role stan- 
dards about achievement. Stein and Smith- 
ells (1969) found that both females and 
males considered reading activities feminine 
and arithmetic activities masculine. 

Considerable research has been done on 


the subject of sex differences in reading and 
Pele 
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arithmetic achievement. In general, females 
have been found to excel in reading, and 
males in arithmetic, but the differences have 
varied with age, socioeconomic status, IQ, 
and the specific subskill being measured 
(Anastasi, 1958; Dwyer, 1973; Lewis, 1968; 
Maccoby, 1966). It was concluded for the 
purposes of this study that biological sex 
and liking or disliking a subject area were 
insufficient explanations for observed pat- 
terns of sex differences in achievement. 
Several studies have attempted to relate sex 
role standards to achievement behaviors. 
Carey (1958) showed that for male and fe- 
male college students, sex differences in 
problem-solving proficiency were in part a 
reflection of attitudes toward the sex-appro- 
priateness of problem solving and could be 
diminished through group discussions. Mil- 
ton (1959) found that when the character- 
isties of problems were made less appropri- 
ate to the masculine sex role, the sex differ- 
ences in problem-solving performance which 
had previously been observed were dimin- 
ished. 

Stein, Pohly, and Mueller (1971) found 
that two determinants of achievement mo- 
tivation—attainment value and expectancy 
of success—were influenced by the individu- 
al's perception of the sex-role appropriate- 
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ness of the task. Subjects showed higher 
attainment value and expectancy of success 
on sex-appropriate tasks. 

Only one other study (Mazurkiewiez, 
1960) had dealt directly with the relation- 
ship between sex role standards and aca- 
demie achievement. Using a sample of 
eleventh-grade males, Mazurkiewiez found 
that reading achievement test scores were 
higher for those boys who considered read- 
ing a masculine activity. This is the only 
study that has made an attempt to relate 
sex role standards to actual achievement 
test information, and even here, only males 
were used as subjects and only at one grade 
level. 


METHOD 


Subjects 


, The subjects were 385 Caucasian children attend- 
ing publie schools in a suburban, northern Cali- 
fornia community. All lived in a small geographical 
area consisting of one large, homogeneous housing 
development whose residents were all of similar 
socioeconomic status (predominantly high school 
graduates, employed in clerical capacities and as 
skilled „or semiskilled workers). The subjects were 
pista equally divided between males and 
emales and approximately equally divided 
Grade Levels 2, 4, 6, 8, 10, and 12. d P 
The mean IQ for this sample was 105.8, based 
on group IQ tests routinely administered by the 
school district, 


Materials 


Information obtained for each subject included 
1Q scores; reading and arithmetic achievement test 
scores (the Stanford Achievement Test in Grades 
2, 4, 6, and 8, and the Iowa Tests of Educational 
Development in Grades 10 and 12) ; & sex role stan- 
dards checklist scored for both reading and arith- 
metic; and an individual sex role preference check- 
list scored for reading, arithmetic, and sex typing. 

, The sex role standards (SRS) checklist and the 
individual sex role preference (ISRP) checklist 
were modeled after checklists devised by Stein and 
Smithells (1969) to assess children's sex role stan- 
dards about achievement and other areas, Addi- 
tional items were written for the present study in 
the areas of reading and arithmetic, since Stein 
and Smithells' lists contain only six items in each 
of these categories. The SRS and ISRP checklists 
each consist, of 46 items representing interests and 
activities in many areas, including 10 each in read- 
ing and arithmetic. The two checklists contained 
the same items but in different order and with 
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different directions to the subject: The SRS check. 


list asked the subject which interests and activities 
he or she thought boys or girls preferred; the ISRP 


checklist asked the subject which interests and 
activities he/himself or she/herself preferred. On 
the SRS, the subject would respond by circling 
a "B" (boys prefer this activity), an “S” (both 
sexes like this activity equally), or a “G” (girls 
prefer this activity). On the ISRP checklist, the 
subject could respond by circling an “L” (I like 
this activity), a “?” (I cannot decide whether I 
like this activity), or a “D” (I dislike this activ- 
ity). 


Procedures 


One female examiner administered all tests, 
with the exception of the tenth- and twelfth-grade 
achievement tests, which were administered by high 
school personnel of both sexes. Tests were admin- 
istered in class-sized groups. In Grade 2, only the 
checklists were administered; this was done in 
small groups in which the examiner read each item 
aloud and the child marked his or her own response. 
The checklists were introduced informally as ques- 
tionnaires designed to assess responses of students 
of different ages and sexes. The use of the “S” re- 
sponse in the SRS checklist and the *?" response in 
the ISRP checklist was mildly discouraged. 

The SRS checklist was scored by assigning & 
score of 1 to the response “B,” 2 to the response 
"S," and 3 to the response “G.” The checklist 
originally contained 10 items representing reading 
and 10 representing arithmetic, but 2 of the 10 
reading items and 3 of the 10 arithmetic items 
were eliminated because they failed to correlate 
highly enough with the total score for each cate- 
gory. The possible range of SRS scores, then, was 
from 8 (all 8 items judged masculine) to 24 (all 
8 items judged feminine) for reading and from 7 
(all 7 items judged masculine) to 21 (all 7 items 
judged feminine) for arithmetic. 

Three scores were computed for the ISRP check- 
list: a reading preference score (ISRP;), an arith- 
metic preference score (ISRP,), and a sex role 
conformity (sex-typing) score (ISRP,.). The ISRP 
checklist was scored by assigning a score of 1 for 
each reading or arithmetic item disliked, 2 for 
each response of “?,” and 3 for each item liked. The 
ISRP sex-typing score was based on the number of 
items the subject actually liked or disliked which 
had previously been chosen most often by the en- 
tire group as being appropriate for boys or for girls. 
That is, the total peer group responses on the SRS 
checklist determined which items (among all 46) 
were the most masculine and which were the most 
feminine, and the ISRP.. score reflected the degree 
to which the individual subject conformed to 
or deviated from the items most strongly associate! 
(by group consensus) with his or her sex. The total 
Sex-typing score had a possible range of from 
to 60 (in the former, the subject disliked all the 
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items chosen by his or her peers as being appropri- 
ate to his or her sex and liked all the items chosen 
by his or her peers as being inappropriate to his or 
her sex—the lowest sex-typed score—and in the 
latter, the subject liked all the items chosen by the 
peer group as being appropriate to his or her sex 
and disliked all the items chosen by the peer group 
as inappropriate to his or her sex—the most highly 
sex-typed score.) 


RESULTS 


The central hypotheses of this study were 
tested by a series of multiple regression 
analyses, applications developed by Adler 
(1971) and Cohen (1968). Reading and 
arithmetic data were treated separately, 
first for the entire sample, then for each 
grade level separately, and then for males 
and females separately. In each of the mul- 
tiple regression analyses, the relative im- 
portance of each variable in predicting 
achievement scores was considered in two 
ways. First was the unique variance, an es- 
timate of the importance of a variable apart 
from the effects of all the other variables 
that were entered into the regression equa- 
tion. This estimate is obtained by entering 
all the other variables except the specified 
variable into the regression equation and 
subtracting the resulting sum (the variance 
accounted for by all the other variables) 
from the estimate obtained by entering all 
the variables (including the specified vari- 
able) into the equation. The second estimate 
of the relative importance of a variable is 
the combination of a variable’s unique vari- 
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ance it holds in common with the other vari- 
ables. 


Reading 


Table 1 indicates that SRS contributed 
more unique plus common variance than sex 
or ISRP: and less unique plus common vari- 
ance than IQ or grade level (Items A 
through E, Column 3), The SRS, accounted 
for 4.5% of the variance in reading achieve- 
ment scores when considered in combination 
with other variables. With the effects of the 
other variables partialed out, SRS, ac- 
counted for 1.5% of the total variance. This 
figure is significant at the .001 level. 

The sex-typing score derived from the 
ISRP checklist was found to be nearly un- 
related to reading achievement scores. The 
unique plus common variance accounted for 
by ISRP,: was .01%, and the unique vari- 
ance accounted for was .0%. 

The relationship of IQ, SRS,, and sex to 
reading achievement scores was analyzed 
for the entire sample and for each grade 
level separately. In Grades 2 and 4, the 
unique variance accounted for was negligible 
(.6% and .01% respectively). The effects of 
SRS, were strongest in Grades 8 and 10, ac- 
counting for 7.75% and 4.4%, respectively. 

Data were also analyzed separately for 
females and for males. The effects of IQ on 
achievement were approximately equal for 
females and males. The effects of SRS, were 
stronger for males than for females (for fe- 


TABLE 1 
MuvriPLE Reeoresston: IQ, SRS, ISRP;, SEX, Grave LEVEL, AND READING ACHIEVEMENT 
Independent variable R m Increment 
IQ (A) .4038 .1630 
SRS, (B) .2129 .0453 
ISRP, (C) .0286 .0008 
Bes (D) .0381 Pu 
rade Level (E) -3189 E 
A 104 3726 A over (B + C + D + E) = .2171 
ds ae 9 B over (A + C + D + E) = .0151* 
C over (A + B + D + E) = .0022 
D over (A + B + C + E) = .0034 
E over (A + B + C + D) = .1782 


Note. Abbreviations: SRS, = sex role standards checklist, reading; ISRP, = individual sex role 


Preference checklist, reading. 
*p < 001. 
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males .0198 unique variance contributed by 
SRS,; for males, .0351 unique variance). 

In sum, sex role standards about reading 
still accounted for between 1% and 8% of 
the total variance in reading achievement 
test scores even when the effects of sex, IQ, 
and liking or disliking reading were partialed 
out. The effect held true at all grade levels 
but was strongest at Grade Levels 6 and 
above. 


Arithmetic 


Table 2 describes the relationship of the 
variables IQ, SRS,, ISRP,, sex, and grade 
level to arithmetic achievement test scores 
for the entire sample. This table indicates 
that SRS, contributed more unique plus 
common variance than ISRP, or sex and less 
unique plus common variance than grade 
level or IQ. The SRS, accounted for 5.97% 
of the variance in arithmetic achievement 
scores when considered in combination with 
other variables. With the effects of the other 
variables partialed out, SRS, contributed 
2.6% unique variance. This figure is signifi- 
cant at the .01 level. 

The sex-typing score derived from the 
ISRP checklist was found to be nearly un- 
related to arithmetic achievement scores (as 
it was to reading achievement scores). The 
unique plus common variance accounted for 
by ISRP,. was .006% and unique variance 
accounted for by ISRP,. was .008%. 

The relationship of IQ, SRS,, and sex to 
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arithmetic achievement scores was analyzed 
for the entire sample and for each grade 
level separately. The unique variance ac- 
counted for by SRS, was lowest in Grade 12 
(395) and highest in Grades 8 and 10 
(5.67% and 8.7%, respectively). In this 
analysis by separate grade levels, it was only 
at Grades 8 and 10 that the SRS, unique 
variance figure reached an acceptable level 
of significance. 

Data were analyzed for the entire sample 
and for females and males separately. IQ 
contributed more unique variance to arith- 
metie achievement score for males than for 
females. There was no significant sex differ- 
ence in the correlations of IQ with reading, 
but for females the correlation between IQ 
and reading was significantly higher than 
the correlation between IQ and arithmetic. 
As in reading, the effects of sex role stan- 
dards in arithmetic were stronger for males 
than for females. For males the reverse was 
true. IQ was a better predictor of girls’ read- 
ing achievement than their arithmetic 
achievement (and the reverse for boys), and 
it is assumed that social (sex role standards) 
factors accounted for part of this difference. 

Sex role standards about arithmetic still 
accounted for between 1% and 9% of the var- 
iance in arithmetic achievement scores, even 
when the effects of IQ, sex, and liking or 
disliking arithmetic were controlled. 

The relationship between sex role stan- 
dards and arithmetic achievement held true 


TABLE 2 
Muuttpte Reoresston: IQ, SRS,, ISRP,, Sex, GRADE LEVEL, AND ARITHMETIC ACHIEVEMENT 
Independent variance R R Increment 
IQ (A) -3967 -1574 
SRS, (B) +2444 .0597 
ISRP, (C) -0254 -0006 
Sex (D) -0870 -0076 
Grade (E) .3383 1144 
A+B+C+D+E .6026 .3631 A over (B + C + D + E) = .1902 
B over (A + C + D + E) = .0260* 
C over (A + B + D + E) = .0007 
D over (A 4- B -- C 4- E) = .0008 
E over (A + B + C + D) = .1604 


Note. Abbreviations: SRS, — se. 
preference checklist, arithmetic, 
*p < 0l. 


x role standards checklist, arithmetic; ISRP, = individual sex role 
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for both males and females but, as in read- 
ing, was stronger for males than for females 
(1% vs. 496). 

Contrary to expectation, individual sex 
role preference scores contributed very little 
variance to reading or arithmetic achieve- 
ment. No explanations for this were ap- 
parent, but it was also noted that the ex- 
pected sex differences in ISRP, and ISRP, 
did not emerge for this sample. ISRP, scores 
favored girls (girls stated that they liked 
reading more than boys so stated), but there 
were no significant sex differences in ISRP, 
Scores. 


Sex Role Standards 


The SRS, scores were significantly (t = 
12.08, df = 352, p « .01) higher than 16, 
indicating that the subjects considered 
reading more feminine than masculine. Simi- 
lar results were obtained for females and 
males at every grade level. The SRS, scores 
were significantly lower than the expected 
mean value of 14 only for females in Grades 
10 and 12 (t = —2.16, df = 32, p < .05; 
t = —239, df = 27, p < .05). 


Other Sex Differences 


To determine sex differences, t tests were 
carried out. There were no significant differ- 
ences in ISRP sex-typing scores, in IQ, or 
achievement scores until Grade 12, where 
the difference favored males. Differences in 
arithmetic achievement scores favored boys 
at all grade levels but reached significance 
only in Grade 2 and for all grade levels com- 
bined. 

All tests for form and order of adminis- 


tration effects were found to be nonsignifi- 
cant. 


Discussion 


The stronger effect of sex role standards 
on males’ than on females’ achievement is 
Most probably a reflection of the greater 
latitude our culture allows females in par- 
ticipating in the male role than males in 
Participating in the female role. 

he response patterns on the sex role 
Standards checklist also support the obser- 
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vation that stronger social sanctions exist 
against males participating in the female 
sex role. The response patterns suggest that 
while males are subject to these greater 
sanctions, it is they themselves who define 
them. The response patterns on the sex role 
standards questionnaire showed that there 
was more agreement between the sexes as 
to what constituted girls’ things than there 
was as to what constituted boys' things and 
that it was the boys who accounted for this 
disparity. Boys were more likely to label 
things as exclusively male than girls were. 
However, girls were not significantly more 
likely than boys to label things feminine 
(that is, boys agree with girls as to what 
constitutes feminine activities). Boys also 
gave fewer "S" responses than girls on the 
sex role standards checklist and more re- 
sponses indicating that certain interests and 
activities are appropriate to their own sex. 
Another interesting finding in the SRS 
checklist response patterns was a trend of 
the younger subjects in this sample to show 
more rigidity concerning sex roles than the 
older subjects. On the sex role standards 
checklist, older subjects made many more 
responses of "S" than did the younger sub- 
jects. This is in direct contradiction to the 
common research assumption that a high 
degree of sex role differentiation is a char- 
acteristic of increasing personal maturity. 
It is possible that this finding reflects a 
growing awareness of the personally limiting 
properties of strong sex role typing (as op- 
posed to individual choice) and that a ma- 
ture synthesis may give higher priority to 
personal abilities and interests than to main- 
tenance of social sex role traditions. 


REFERENCES 


Adler, P. T. Ethnic and socioeconomic status dif- 
ferences in human figure drawings. Journal of 
Consulting and Clinical Psychology, 1971, 36, 
344-354. 

Anastasi, A. Differential psychology. (3rd ed.) New 
York: Macmillan, 1958. 

Carey, G. Sex differences in problem solving as 
function of attitude differences. Journal of Ab- 
normal and Social Psychology, 1958, 56, 256-260. 

Cohen, J. Multiple regression as a general data- 
analytic. system. Psychological Bulletin, 1968, 
^10, 426-443. 


816 


Dwyer, C. A. Sex differences in reading: An evalu- 
ation and a critique of current theories. Review 
of Educational Research, 1973, 43, 455-467. 

Kagan, J. The child’s sex role classification of school 
objects. Child Development, 1964, 35, 1051-1056. 

Lewis, E. C. Developing woman’s potential. Ames: 
Towa State University Press, 1968. 

Maccoby, E. E. The development of sez differences. 
Stanford, Calif.: Stanford University Press, 1966. 

Mazurkiewicz, A. J. Social-cultural influences and 
reading. Journal of Developmental Reading, 
1960, 3, 254-263. 

Milton, G. A. Sex differences in problem solving 
as a function of role appropriateness of the 
problem content. Psychological Reports, 1959, 5, 
705-708. 

Stein, A. H. The effects of sex-role standards for 
achievement and sex-role preference on three 


CAROL ANNE DWYER 


determinants of achievement motivation. Devel- | 
opmental Psychology, 1971, 4, 219-231. 

Stein, A. H., Pohly, S. R., & Mueller, E. The in- 
fluence of masculine, feminine, and neutral tasks 
on children's achievement behavior, expectancies 
of success, and attainment values. Child Devel- 
opment, 1971, 42, 195-207. 

Stein, A. H., & Smithells, J. The sex-role standards 
about achievement held by Negro and White 
children from father-present and father-absent 
homes. Unpublished manuscript, Cornell Univer- 
sity, 1968. 

Stein, A. H., & Smithells, J. Age and sex differences 
in children’s sex-role standards about achieve- 
ment. Developmental Psychology, 1969, 1, 252- 
259. 


(Received December 26, 1973) 


: 


l 


Journal of Educational Psychology 
1074, Vol at. No. 6, 817-895 


CHILDREN’S SOLUTION PROCESSES IN ARITHMETIC 
WORD PROBLEMS' 


DANIEL J. A. ROSENTHAL’ ann LAUREN B. RESNICK 
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The influence of mathematical and linguistic factors in solving verbally 
presented arithmetic problems was investigated in two groups of third- 
grade children. Problems in which events were mentioned out of 
chronological order were more difficult to solve. Problems with the 
starting set unknown (verbal representations of equations such as 
— + c = e) were more difficult and took longer than ending set un- 
known problems. An interaction indicated that problems with verbs 
denoting a gain (verbal representations of addition equations) were 
more difficult than loss verb problems when the starting set was un- 
known, and easier when the ending set was unknown. Two possible 
solution process models for unknown starting set problems are dis- 


cussed on the basis of solution time data. 


Word problems in arithmetic are tasks 
which require the integration of linguistic 
and arithmetie processing skills. In word 
problems, a situation is described in which 
there is some modification, exchange, or 
combination of quantities. On the basis of 
this verbal description, the individual solv- 
ing the problem must construct a represen- 
tation of the arithmetic operations called 
for; this representation mediates the arith- 
metic solution process. 

There is a small body of research on arith- 
metic problems which attempts to uncover 
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the processes involved in performing arith- 
metic operations such as addition and sub- 
traction (cf. Groen & Parkman, 1973; Park- 
man & Groen, 1971; Restle, 1970; Suppes & 
Groen, 1967; Woods, 1972). In this research, 
problems are presented numerically in the 
form of simple algebraic equations. A few 
studies of this kind have related the position 
of the unknown set to the solution process 
employed (cf. Groen & Poll, 1973; Peter- 
son & Aller, 1971; Suppes, Hyman, & Jer- 
man, 1967), but in this work there has been 
no attention to the role of verbal or situa- 
tional context on arithmetic performance. 
Another small body of research on arith- 
metic word problems has attempted to relate 
language variables to the way in which 
verbal messages are encoded and stored. 
Steffe (1967) showed that children made 
fewer errors solving problems in which the 
names of the sets to be added were the same 
(e.g., cars and cars) than in problems where 
the set names were different (e.g., jacks and 
marbles). Loftus and Suppes (1972) found 
a significant relationship between the strue- 
tural complexity of the language in a prob- 
lem and the probability of making a correct 
response. Paige and Simon (1966) per- 
formed a detailed analysis of the way in 
which adults translate verbal information 
in algebra word problems into equations; 
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however, they did not attempt to study the 
effect of these representational processes on 
the actual solution of the equations. 
Outside the arithmetic context, there ex- 
ists a large and varied body of literature on 
verbal comprehension processes, that is, the 
ways in which verbal messages are repre- 
sented and stored. In much of this research, 
the organization of recall is used as the index 
of representational strategy. There has been 
little exploration of the way in which differ- 
ent forms of verbal expression and repre- 
sentation might affect the use of information 
in subsequent problem-solving behavior. 
The study of processes used in solving 
simple arithmetic word problems affords an 
opportunity to study the relationship be- 
tween verbal processes and nonlinguistic 
task performance in a context that has a 
relatively high degree of “ecological valid- 
ity.” Most arithmetic processing tasks that 
are encountered outside the classroom arise 
from a situation in which equations or 
equivalent representations must be con- 
structed as well as solved. Similarly, verbal 
information is typically recalled or recon- 
structed for use in some problem-solving 
or communication task, a condition difficult 
to simulate using typical laboratory tasks. 
The present study focuses on a carefully 
defined set of word problems that permits 
examination of the effects of specific aspects 
of the verbal stimuli and the problem situa- 
tion. All problems required only a single op- 
eration, either addition or subtraction. All 
quantities manipulated were greater than 
one and less than 10. All problems were 
presented in a verbal form that contained 
three separate clauses, one describing the 
“starting set,” one describing the "change 
set,” and one describing the "ending set." 
The problems differed in three dimensions: 
(a) the order of mention of chronological 
events, that is, the sets were mentioned in 
chronological order (starting, change, end- 
ing) or in reverse order (ending, change, 
starting) ; (b) the identity of the unknown 
set, that is, either the starting or the ending 
set was the unknown set in a problem (e.g., 
"5-222. "verus" +2= 5”); and (c) 
the type of verb associated with the change 
set; the verb expressed either a gain or a 
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loss, suggesting an initial equation involy- 
ing addition or subtraction, respectively. 
Order of mention. Prior research hag 
shown that when the order in which events 
are mentioned in a verbal expression does 
not match the chronological order of oc- 
currence, recall is more difficult. Clark and 
Clark (1968) argued that events mentioned 
out of temporal order (as, for example, in 
the sentence, *He swiped the cabbages after 
he tooted the horn [p. 130]," are marked in 
memory so as to distinguish the order of 
occurrence from the order of mention. Bever 
(1970) suggested that underlying the 
Clarks’ results was a basic habit of listening 
in which *in the comprehension of ordered 


events we organize relations by starting | 


with the first event, organizing other events 
as subsidiary to the first [p. 286]." Hutten- 
locher and Weiner (1971) have shown that 
when ordering objects in response to a de- 
Seriptive statement of their relationships, 
children have a tendency to move the first- 
mentioned object first. Thus, they begin by 
processing the first-mentioned item. 

This research suggests that in a word 
problem in which order of mention is the 
reverse of chronological order, subjects’ nor- 
mal processing habits are disrupted. They 
may either treat the first-mentioned set er- 
Toneously as the starting set, that is, fail to 
mark the discrepancy of orders; or they may 
search for the starting set and then organize 
the remainder of the problem around it. 
Thus, the following two hypotheses can be 
made for the order of mention variable: 


1. There are more errors in backward 
order of mention than in forward order of 
mention problems. 

2. For correctly solved problems, back- 
ward order problems have longer solution 
times than forward order problems. 


Identity of the unknown set. Work by 
Suppes et al. (1967) has established that for 
algebraically presented addition problems 
of the form s + e = e (where the symbols 
5, €, and e, refer to the starting set, change 
Set, and ending set, respectively), problems 
where the starting set is unknown (i.e., — + 
€ = e) are more difficult and take longer to 
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solve than problems where the ending set is 
unknown (ie. s + ¢ = —). The increased 
difficulty of problems where the initial set is 
unknown presumably results from the differ- 
ent procedures required for the solution of 
problems in this form. Previous research 
with algebraically presented addition prob- 
lems where the final set was the unknown, 
that is, s + e = — (ef. Suppes & Groen, 
1967; Groen & Parkman, 1972), has sup- 
ported a counting model of problem solution 
in which children store the larger of the two 
given numbers in a mental counter, and then 
increment by the smaller number. Although 
no solution model for starting set unknown 
problems has been clearly supported (cf. 
Groen & Poll, 1973; Suppes et al., 1967), it 
is clear that the solution procedures for 
problems where the ending set is unknown 
cannot be directly applied to these problems 
because (a) the presence of a starting set is 
a requirement for this model to operate, and 
(b) in starting set unknown problems, the 
unknown set does not stand alone to one 
side of the equal sign, so the procedure of 
combining the two given numbers using the 
given operation does not give the correct 
answer. This means that some additional 
operations must be involved in solving un- 
known starting set problems. The following 
hypotheses can thus be stated for the iden- 
tity of the unknown set variable: 


_ 8. Word problems where the unknown set 
is the starting set are more difficult to solve 
than problems where the unknown set is the 
ending set. 

4. For correctly solved problems, un- 
known starting set problems have longer 
solution times than unknown ending set 
problems. 


Type of verb. There is no basis in past re- 
Search for making specific predictions con- 
cerning the effect of gain versus loss verbs. 
The relative difficulty of numerically pre- 
Sented simple addition and subtraction prob- 
lems (ie, s + ¢ = —, ands — e = —) has 
hever been directly compared, to our knowl- 
edge. Further, in our problems, gain verbs 
do not necessarily imply the use of addition 
operations nor do loss verbs necessarily im- 
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ply the use of subtraction operations. Prob- 
lems where the starting set in unknown, as 
in problems based on — + c = e, might be 
solved by transformation into an equivalent 
canonical form, i.e., e — e = — (cf. Suppes 
et al., 1967), thus, involving a subtraction 
operation in a gain verb problem. Alter- 
natively, such problems might be solved by 
a trial and error procedure in which various 
values of s are tested directly in the initial 
— + e = e equation, thus using an addition 
operation. The data from the present experi- 
ment should allow more specific hypotheses 
concerning solution of problems of these 
kinds to be formulated. 


METHOD 


Subjects 


Two groups of third-grade children from two 
publie schools served as subjects in two replica- 
tions of the experiment. The subjects in both 
schools were from predominantly white working- 
class families. The 29 subjects in Experiment 1, 
15 boys and 14 girls, were just beginning the third 
grade. The 34 subjects in Experiment 2, 15 boys 
and 19 girls, were just completing the third grade. 
Both subject groups represented a cross-section of 
the pupil population with regard to academic 
standing. 


Stimulus Materials 


Item forms (Hively, Patterson, & Page, 1968; 
Osburn, 1968) were used to generate a series of 
arithmetic word problems which systematically 
varied with respect to the three factors previously 
defined: order of mention, identity of the un- 
known set, and type of verb. Initially, we defined 
(a) a set of generalized frames or item forms, con- 
taining both fixed and variable elements; (b) a list 
of replacement sets for the variable elements; and 
(c) å set of substitution rules that specified which 
replacement sets could fill which variable element 
positions. Problems were generated by assigning 
replacement sets to their corresponding variable 
element positions, while the fixed element posi- 
tions remained constant. The item forms, replace- 
ment sets, and substitution rules completely 
defined the universe from which the experimental 
problems were drawn. 

The item forms were based on a subset of the 
general equations defined by Suppes et al. (1967, 
p. 168). These number problem equations are 
shown in the leftmost column of Table 1. The 
number equations differ with respect to the posi- 
tion of the unknown set (starting or ending) and 
the sign of the operation (plus or minus). Each 
number equation appears twice, once for word 
problems in the forward order of mention (Forms 
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TABLE 1 
Numer Prosiem Irem Forms, Worp PROBLEM ITEM Forms, AND SAMPLE PROBLEMS 
Number problem Word problem item forms Sample problems 
item forms 
1.s +c = — | If subject started out with s objects | If Paul started out with 5 boats 
and he gained c objects, how many he bought 3 boats, how many bi 
objects did he end up with? did he end up with? 
E 2.s — c = — | If subject started out with s objects | If John started out with 7 cars and h 
E and he lost c objects, how many ob- sold 2 cars, how many cars did 
g jects did he end up with? end up with? 
3. — + ¢ = e | How many objects did subject start | How many stamps did Dave start out 
g out with if he gained c objects and with if he found 2 stamps and h 
2 he ended up with e objects? ended up with 6 stamps? 
a 4.——c=e | How many objects did subject start| How many balls did Bill start ow 
& out with if he lost c objects and he| with if he lost 2 balls and he end 
Ms ended up with e objects? up with 4 balls? 
hy 
$ 5.s +c = | How many objects did subject end up | How many boats did Paul end up with 
[s] with if he gained c objects and he if he bought 3 boats and he started 
started out with s objects? out with & boats? ! 
6.s—c=— | How many objects did subject end up | How many cars did John end up with 
T with if he lost c objects and he if he sold 2 cars and he started out 
$ started out with s objects? with 7 cars? 
“| 7.—+¢ =e | If subject ended up with e objects and | If Dave ended up with 6 stamps and 
E] he gained c objects, how many ob- he found 2 stamps, how may stamps 
sl jects did he start out with? did he start out with? 
8._—c=e | If subject ended up with e objects and | If Bill ended up with 6 balls and he 
he lost c objects, how many objects lost 2 balis, how many balls did he 
did he start out with? start out with? 


1-4) and once for word problems in the backward 
order of mention (Forms 5-8). 

The item forms for word problems are in the 
center column of Table 1. Item forms 1-4 present 
information about the sets in the forward order 
of mention, that is, in chronological order. Item 
forms 5-8 present the information in the backward 
order of mention, that is, in reverse chronological 
order. The ending set is unknown for Item Forms 
1, 2, 5, and 6, and the starting set is unknown for 
Forms 3, 4, 7, and 8, in accordance with the number 
problem upon which the item form was based. 


TABLE 2 


Lists or THE REPLACEMENT SETS FOR 
THE VARIABLE ELEMENTS OF THE 


Irem Forms 
Swiet | fe | en | Rist | tae [Pumper 
Bill bought! sold | balls Sister | 5, 3 
Dave | found | lost | blocks | brother 5, 2 
Dick | took |gave| boats | father 6,2 
Fred | got sent | books | mother | 7, 2 
John cars ; 
Mark pens 
Pete stamps 
Paul trucks 


Similarly, Forms 1, 3, 5, and 7 accept gain verbs, 
and Forms 2, 4, 6, and 8 accept loss verbs, in 
accordance with the sign of the operation of their 
underlying number problem. Restrictions were 
applied such that the verbs bought, found, sold, 
and lost always appeared in sentences with no 
indirect object, while took, got, gave, and sent 
always appeared in sentences containing al 
indirect object phrase. Sentences with and without 
indirect object phrases were equally distributed | 
across all experimental conditions. 
The replacement sets used to generate item | 
Sentences are shown in Table 2, and a sample 
problem for each item form is shown in the - 
rightmost column of Table 1. The variable ele- 
ments of the item forms and the replacement sets 
substituted in the sample items are shown in italics 
for the sake of clarity. During item generation, | 
number pairs were selected so that the change set 
was always smaller than the other given set, that 
is, the starting or the ending set. This made it 
possible to either add or subtract the change sets 
and thus prevented the subjects from obtaining 
the correct solution operation by simply eliminat- 
ing one of the two possible operations. The values 
for s, c, and e ranged from 2 to 9, as suggested by 
the number pairs shown in Table 2, The gain anc 
loss verbs were chosen to be "minimal contrast 
pairs, that is, bought and sold, found and lost, took , 
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and gave, and got and sent*. Theoretically, this 
selection procedure limited the difference between 
gain and loss verb item forms to the single 
semantic feature, “acquisitiveness” (cf. McNeill, 
1960; Perfetti, 1968). The aim was to make prob- 
lems taking gain and loss verbs as semantically 
similar as possible. 

A total of 32 word problems were generated, 
four from each item form. Replacement items were 
assigned to the variable elements of the item forms 
so that (a) the four number pairs were fully, and 
therefore, equally represented in the four problems 
generated from each item form; and (b) the four 
gain and the four loss verbs were fully, and 
therefore, equally represented in the four problems 
generated from each gain and loss item form. The 
experimental problems were typed in large-sized 
primary characters on plain 5 X 8 inch index cards. 
The cards were arranged into four blocks of eight 
problems. Each block contained one problem from 
each item form. Problems within a block were 
arranged randomly, and problem blocks were 
presented in random order. 


Apparatus 


The apparatus consisted of a plexiglass response 
box and a 1/100 second timer. The side of the box 
facing the child sloped upwards at an angle of 60 
degrees, and a rack on the upper portion held the 
current problem card. The lower portion contained 
a set of 10 push-button mieroswitches arranged in 
an arc. The microswitch caps were numbered from 
1 to 10, and were one-half inch in diameter. One 
of 10 correspondingly numbered indicator lights 
mounted on the back of the box lighted when a 
response button was depressed. The lights enabled 
the experimenter to directly read the subjects’ 
responses. The timer was activated automatically 
by seating a problem card in the rack, and halted 
automatically when a response button was de- 
pressed. Latencies were recorded from the timer by 
the experimenter. 


Procedure 


Prior to the experimental task each subject 
solved 20 arithmetie number problems arranged in 
five randomized blocks. Sixteen of these problems 
Were constructed from the number pairs in "Table 2, 
With the unknown position designated by an under- 
line. Four additional problems were constructed, 
One from each form in Table 1, so that all of the 
response keys would be included in the set of 
Correct, responses. Subjects were instructed to solve 
& problem and then to press the appropriate 
answer key with the index finger of the preferred 
hand. The subjects were told that the object of 
the task was to get as many problems correct as 
they could, and that they would be informed of 
their score after every fourth problem and again 
— 


"The minimal contrast for the word sent should 
more properly be receive, but this word was con- 
sidered too difficult for third-grade subjects. 
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at the end of the task. A buzzer pressed by the 
experimenter served as a ready signal prior to 
presenting a problem. The warm-up task allowed 
subjects to become familiar with the apparatus 
and afforded the opportunity to assess basic com- 
putational skills, which were defined as performance 
on s + ¢ = — and s — ¢ = — problems. The sub- 
jects in Experiment 1 had an average of 92% 
correct; the subjects in Experiment 2 had an aver- 
age of 99% correct. 

Immediately following the warm-up task, each 
subject solved 32 arithmetic word problems. The 
subjects were instructed to read each problem 
aloud and, after solving it, to press the appro- 
priate answer key with the index finger of the 
preferred hand. Subjects were told again that the 
object of the task was to get as many problems 
correct as they could and that they would be in- 
formed of their score after every eighth problem 
and again at the end of the task. Subjects were 
given a three-minute rest when half of the prob- 
lems were finished. The average time to complete 
the warm-up and experimental tasks combined 
was about 25 minutes. 


RxsurTS 


Error Data 


A2 x 2 x 2 analysis of variance with re- 
peated measures on every factor was per- 
formed for both experiments to determine 
the effects of identity of the unknown set 
(starting versus ending), type of verb (gain 
versus loss), and order of mention (forward 
versus backward). 

The results were almost identical for the 
two replications. Except for the effect of 
type of verb which was borderline in Ex- 
periment 1, the three main effects were sig- 
nificant in both experiments as follows: 
identity of the unknown set (F = 103.09, 
df = 1/28, p < .001; F = 63.42, df = 1/33, 
p < 001); type of verb (F = 387, df = 
1/28, p < 06; F = 9.01, df = 1/33, p < 
.01) ; and order of mention (F = 20.22, df = 
1/28, p < 001; F = 2493, df = 1/33, p < 
.001). There were two significant interaction 
effects in both experiments, identity of the 
unknown set with order of mention (F — 
18.84, df = 1/28, p < .001; F = 4.73, dfc 
1/33, p < .05), and identity of the unknown 
set with type of verb (F — 19.51, df — 1/28, 
p < 001; F = 813, df = 1/33, p < 01). 
Mean number of errors for the cells involved 
in these interactions are shown in Table 3. 

The effect for identity of the unknown set 
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TABLE 3 
Mean Errors AND STANDARD DEVIATIONS FOR SIGNIFICANT INTERACTIONS 
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Experiment 1 Experiment 2 Experiment 1 Experiment 2 
Set 
rd i : 
Raped ese |) Peet Beret | Gus | tow | Quin 
ti kı 

Saning Rie 4.28 6.07 3.47 4.88 6.10 4.24 5.03 
SD 2.28 2.09 2.29 2.37 2.09 2.43 2.53 

Ending unknown 
X. 1.38 1.41 79 1.38 +93 1.86 1.03 
SD 1.42 1.38 .91 1.26 1.22 1.98 1.27 


was as predicted (Hypothesis 3) : there were 
more errors when the unknown set was the 
starting set (X = 10.34 versus 2.79 in Ex- 
periment 1; 8.35 versus 2.18 in Experiment 
2). In accord with Hypothesis 1, there were 
more errors when the order of mention was 
backwards (X = 7.48 versus 5.66 in Experi- 
ment 1; 6.26 versus 4.26 in Experiment 2). 
Gain verb problems were also more difficult 
than loss verb problems (X = 7.03 versus 
6.10 in Experiment 1; 6.06 versus 4.47 in 
Experiment 2). The latter two effects, how- 
ever, were due mainly to the greater diffi- 
culty of backward order and gain verb prob- 
lems when the starting set was the unknown 
set, as is shown in Table 3. 


Reaction Time 


Response latencies were used as a basis 
for inferring the processes employed for cor- 
rectly solving the different classes of prob- 
lems. The reaction time data is based on 
correctly solved problems in the second 
through the fourth blocks; the first block 
was eliminated to achieve more stable re- 
action times. Two latencies in Experiment 1 
exceeded 30 seconds (59.92 and 57.55 sec- 
onds). They were classified as outliers and 
excluded from the data analysis on the as- 
sumption that a different process was used 
in solving these problems than others of the 
same class. The means for each subject in 
each cell of the design were calculated for 
the remaining latencies. Due to the high 
error rate in some cells, only four of the 29 
subjects in Experiment 1 and 12 of the 34 
subjects in Experiment 2 had a full comple- 


ment of cell means and could be included i 
the data analysis. 

A2 x 2 x 2 analysis of variance wi 
repeated measures on every factor was per 
formed on the latency data for both experi- 
ments. Only the identity of the unknown set. 
emerged as a clearly significant main effect 
in both experiments (F — 12.82, df — 1/8, 
p < .05; F = 21.93, df = 1/11, p < .001). 
The mean reaction times for starting set and 
ending set unknown problems were 17.18 
versus 14.43 seconds in Experiment 1; 13.88 
versus 11.68 seconds in Experiment 2. Thus, 
Hypothesis 4 was supported. However, Hy- 
pothesis 2 was not, since there was no sig- 
nificant effect for order of mention. A statis- 
tically weak (F = 2.99, df = 1/11, p < .12) 
effect of order of mention occurred in Ex- 
periment 2, with backward order problems, 
taking approximately one second longer to 
solve than forward order problems (X — 
13.26 versus 12.30 seconds). The interaction 
between the Identity of the Unknown Set X 
Type of Verb approached significance in Ex- 
periment 2 (F = 4.42, df = 1/11, p < .06), | 
and is plotted in Figure 1. As the graph 
makes clear, gain verb problems took sub- 
stantially longer when the unknown set was 
the starting set, while they took somewhat . 
less time when the ending set was unknown. 


Discussion 


Of the four hypotheses initially stated, 
three were clearly supported. Backward 
order of mention problems produced more 
errors than did forward order problems 
(Hypothesis 1). Problems with an unknown 
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Ficure 1. Reaction time data for Experiment 2: Interaction of identity of the unknown 


set with type of verb for (N — 12). 


starting set produced both more errors (Hy- 
pothesis 3) and longer latencies (Hypothe- 
sis 4) than did problems with the ending set 
unknown. There was no clearly significant 
latency effect for the order of mention varia- 
ble (Hypothesis 2). 


Order of Mention Variable 


The main effect of the order of mention 
on the error data was accompanied by an 
interaction with the identity of the unknown 
set (see Table 3). This interaction is difficult 
to interpret since it reflects a virtual absence 
of errors for ending set unknown problems. 

hese problems may have been so easy as 
"a overpower the effect of order of mention. 

n such a case, there would really be only 


a main effect of order of mention to explain. 
To find out whether there is a £rue interac- 
tion of these variables, what is needed is 
further experimentation in which both error 
and latency effects can be examined under 
conditions in which ending set unknown 
problems are not so easy to solve. 


Identity of the Unknown Set Variable 


The identity of the unknown set emerged 
asthe most powerful variable affecting prob- 
lem solution in both experiments. When the 
unknown set was the starting set, problems 
were more difficult to solve. There were 
more errors, and successful solutions took 
longer. The counting models that describe 
processes for solving number problems where 
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the unknown set is in the ending position 
(cf. Groen & Parkman, 1972; Suppes & 
Groen, 1967; Woods, 1972) suggest a likely 
processing model for word problems based 
on these number problems; that is, subjects 
probably processed the numerical informa- 
tion in the word problems by storing (setting 
a mental counter to) the value of the start- 
ing set and then incrementing or decrement- 
ing by the value of the change set. However, 
for word problems where the unknown set 
was the starting set, these counting models 
could not be applied directly because no 
starting set value was given. Two solution 
models for starting set unknown problems, 
each of which would adequately account for 
the present data, can be considered, 

The canonical transformation model. The 
first model was suggested by Suppes et al. 
(1967). In this model the subject is as- 
sumed to transform the given number equa- 
tion into an algebraically equivalent “ca- 
nonical form”, that is, an equation with the 
unknown alone and to the right of the equal 
sign. Thus, given the gain equation — + 
¢ = e, subjects transform it into the ca- 
nonical equation e — e = —, which re- 
quires subtraction for the Solution; like- 
wise — — ¢ =e becomes e +e = _.. The 
resulting canonical equations, which require 
subjects to use the operation Opposite from 
the one given in the stimulus problem, are 
then solved using a counting model. The 
canonical model predicts more errors and 
longer solution times for problems where the 
unknown set is the starting set because an 
additional process for transforming the 
given equation is required. 

Suppes et al. (1967) found that starting 
set unknown equations with subtraction 
signs (ie. — — e = e) took less time to 
solve and caused fewer errors than those 
with addition signs. Our word problem data 
parallel these findings. For starting set un- 
known problems, gain verbs caused more 
errors than loss verbs (Table 3), and took 
longer to solve (Figure 1). The greater diffi- 
eulty of gain over loss problems can be ac- 

counted for within the canonical model by 
assuming that the transformation is more 
complex for gain verb problems. This as- 
sumption is reasonable, since for gain prob- 
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lems a decision concerning which quanti 
should appear first in the transformed equ 
tion must be made (ie, Does — + c 
become e — c = _, orc — e = —?). 
problems, however, transform to additi 
statements, which, since addition is co 
mutative, require no such decision. 

The trial and error model. An alternati 
model for solution of unknown first problen 
assumes a trial and error solution procedul 
in which subjects attempt to fit a numbe 
directly into the initial equation. Accordi 
to this model, the subject sets the unkno 
equal to some number, s’ (chosen randomly 
or systematically), and then increments 0 
decrements by the change set, c, yielding) 
trial solution e’. e' is then compared with 
the given ending set; if they are the same 
then s' is declared as the answer; if not, 8 
new s’ is selected and tested. Like the ca 
nonical model, the trial and error model pre 
dicts increased latencies for problems where 
the starting set is unknown because tl 
counting procedure would have to be applied | 
a variable number of times, depending upon 
the pattern of s choices, rather than only? 
once as for problems where the ending set i$ 
unknown. 

The trial and error model accounts for the | 
finding that loss verb problems with un 
known starting sets are solved more quickly — 
than parallel gain verb problems in the fol- 
lowing way: Loss verb problems yield & 
given equation of the type _ — e = e; this | 
form of equation permits subjects to make | 
the inference that the value of the starting — 
Set, s, must be equal to or greater than the 
change set, c. By contrast, in equations of | 
the type — + e = e, the change set provides 
no constraint on the possible values of s 
For loss verb problems, the constraint on the 
Possible values of s' may facilitate the selec- 
tion of an initial value with which the trial k 
and error algorithm can be initiated and 
may thus allow the trial and error process t0 
begin more quickly. Also, this constraint 
limits the area of search for the unknown 
value, so that fewer iterations of the trial 
and error process are needed. . or gain ver 
problems, the subject must estimate a good 
Starting value for s’ and will also, on the 
average, test more values. 


CHILDREN'S SOLUTION PROCESSES IN ARITHMETIC WORD PROBLEMS 


As the above discussion suggests, both the 
trial and error and the canonical transfor- 
mation models for solving unknown first 
problems appear consonant with the data of 
the present experiment. However, the pres- 
ent latency data are limited, thus requiring 
summation across individuals whose strate- 
gies may in fact be different, and also in- 
troducing considerable variability which 
limits the possibility of drawing strong con- 
clusions with regard to either model. In ad- 
dition, it is unknown exactly how the lin- 
guistic demands of the present task might 
affect processing strategy for these problems. 
For these reasons further experiments de- 
signed to test the validity of both models, 
perhaps using numerically presented equa- 
tions rather than verbal problems, are re- 
quired. 
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RELATIONSHIP BETWEEN MENTAL ABILITIES, SOCIAL. 
CLASS, AND EXPOSURE TO ENGLISH IN 
CHINESE FOURTH GRADERS$' 


LELAND Y. YEE’ anp ROLFE LaFORGE 


California State University, San Francisco 


This study attempted to examine the relation between mental abili- 
ties as defined by raw scores on the Wechsler Intelligence Scale for 
Children (WISC), social class, and exposure to English in 53 Ameri- 
can-born Chinese 9- to 10-year-olds attending a private school in San 
Francisco. From canonical, stepwise, 
no relation was found between social class and the 12 WISC subtests, 
the WISC Overall total, the six WISC Verbal subtests, the WISC 
Verbal total, or the WISC Performance total. A significant relation 
was found, however, Between social 
ance subtests, There was also a sm: 
the exposure to English variables a 
the possible explanations propose 


Definitive statements on social class dif- 
ferences in intelligence are sparse. Hess 
(1970) attributed this to the paucity of ade- 
quate theories to guide such research and to 
the following methodological problems: the 
use of different socioeconomic status (SES) 
measures, the absence of definitive social 
class categories, the absence of attempts to 
partial out the effects of variables related to 
social class, and the lack of research on in- 
trasocial class performance variability. 

Hess (1970) claimed that there is a dis- 
parity range of 8 to 23 IQ points between 
upper and lower social class groups, depend- 
ing on the test used and the subjects age 
level. Curry ( 1962), inan attempt to identify 
the source of SES differences on intellectual 
performance, divided his sixth-grade subjects 
into different SES (lower, middle, and up- 


* This study was revised for publication from a 
thesis completed by the senior author at California 
State University, San Francisco. 

The authors wish to thank Harry Osser and 
Becky Loewy for their assistance in this study. 

* Requests for reprints should be sent to Leland 
Y. Yee, Department of Psychology, University of 
Hawaii, Honolulu, Hawaii 96822. 


all but significant relation between 
nd the WISC Overall total. Among 
d for the unexpected results were: 
small, possibly biased sample, inappropriate social class measures, and 
the nature of social class and its relation to WISC scores within the San 
Francisco Chinese-American population. 


and multiple regression analyses, 


class and the six WISC Perform- 


per) and intelligence levels (low, medium, 
and high— California Test of Mental Matu- 
rity). He found several SES differences in 
reading and language achievement. Where 
they were significant, the higher SES groups 
consistently had the higher scores. The only 
exception was with the “upper intelligence 
group." Curry suggested at that intelligence 
level, the effects of SES differences are — 
masked by the high intelligence. 

Lesser, Fifer, & Clark (1965), using a re- - 
vised version of the Hunter College Aptitude | 
Seales for Gifted Children, found that 
middle-class Chinese first graders (six t0 
Seven years old) scored significantly higher 
on the Verbal, Reasoning, Number, and 
Space seales than did lower-class Chinese 
first graders, However, the patterns of these | 
Scores remained the same for the two groups. 
To determine whether such differences be- 
tween SES groups existed in even older sub- 
Jects, Backman (1971), using twelfth-grade 
Subjects from Project Talent, ° nationwide 
study conducted in 1960, found significant 
differences in both the patterns and levels 
of mental abilities between lower-middle 
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and upper-middle SES subjects. However, 
Backman (1971) noted that although there 
was a significant difference on the patterns 
of mental abilities across SES groups, the 
difference was too small to be considered im- 
portant. 

It should be noted that most of the rela- 
tively few studies on the relation between 
social class and intelligence have relied on 
black subjects and/or white subjects (Chris- 
tiansen & Livermore, 1970), with very few 
using Chinese subjects (Hess, 1970). An ex- 
ception was the Lesser et al. (1965) study 
which found that different ethnic groups 
(Jewish, Puerto Rican, Chinese, and Negro) 
exhibited different levels and patterns of 
mental abilities. Other experimenters have 
also found these group differences (Ana- 
stasi, 1967; Eells, Davis, Havighurst, Her- 
tick, & Tyler, 1951; Hess, 1970). 

The literature on whether bilingualism is 
an aid or a hindrance to intelligence test per- 
formance (Peal & Lambert, 1962; Soffietti, 
1955) is also equivocal. Darcy (1963), in her 

* review of the effects of bilingualism on the 
measurement of intelligence, summarized the 
reasons for the inconsistent findings as fol- 
lows: (a) Subjects are introduced to the 
second language at disparate ages. (b) Sub- 
jects’ SES and cultural backgrounds are not 
matched. (c) Measurements of degrees of 
bilingualism are not uniform. (d) Intelli- 
gence tests used vary. (e) Different methods 
are used to teach subjects the second lan- 
guage. (f) Differential language handicap 
is confounded with educational retardation. 

It was hypothesized that: (a) there is a 
relation between social class and mental 
ability as measured by the 12 WISC subtests 
(Lesser et al., 1965, p. 52); (b) there is a 
relation between social class and the six 
Verbal subtests, but no demonstrable rela- 
tion to the six Performance subtests (Lesser 
et al, 1965, p. 5); (c) there is a positive 
multiple correlation between social class and 
the Overall total, and between social class 
and the Verbal total, but no demonstrable 
multiple correlation between social class and 
the Performance total (Lesser et al., 1965, 
P. 5); (d) exposure to English contributes 
Importantly to the multiple correlation be- 
tween social class and the Overall total, and 
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to the multiple correlation between social 
class and the Verbal total (Lesser et al., 
1965, p. 9). 

A multivariate analysis was used to test 
these hypotheses. For the first two hypoth- 
eses, relating social class and exposure to 
English to WISC scores, the dependent 
variables were the 12 subtests. These were 
separated into two subsets: six Verbal and 
six Performance subtests. For the last two 
hypotheses, relating social class and expo- 
sure to English to WISC totals, the depend- 
ent variables were the Overall, Verbal, and 
Performance totals. 


METHOD 


Subjects 


The sample consisted of 53 American-born 
Chinese fourth graders (29 girls and 24 boys) 
from one of the Freedom Schools* in Chinatown, 
San Francisco. Their mean age was 9¥2 years. 


Variables 


Social class. A social class index was used in 
preference to an SES index; strictly economic 
measures were omitted after considering two 
earlier studies. In one study, Kahl & Davis (1965) 
factor analyzed 19 stratification indices and found 
that parent’s occupational level was the dominant 
variable, with education and residence (home 
types) as the next most important variables, Kohn 
(1969) also concluded that income adds very little 
to identifying class position. Since the Chinese 
people seldom divulge their income to anyone, & 
social class index also avoided the near impossi- 
bility of ascertaining the income of the subjects’ 
families. 

The basis for designating social class came from 
three variables: Residence, education, and occupa- 
tion of the father or head of household. Residence 
was categorized on a 6-point scale (6 = ownership; 
5-1 = decreasing amounts of rent: $170 or more, 
$150-$169, ..., $109 or less). Education was cate- 
gorized in two parallel 8-point scales in an at- 
tempt to equate education in Hong Kong with 
education in the United States, The occupation 
category was a 6-point scale derived mainly from 
Lesser et al. (1965). Occupations not in the list of 
Lesser et al. (1965) were classified according to 


“Late in 1971, in response to feelings against 
busing, Chinese parents banned together to form 
several private schools which have come to be 
known as Freedom Schools. Subjects were re- 
eruited from a Freedom School, rather than a 
public school in San Francisco, because of a com- 
plete ban on any type of intelligence testing in San 
Francisco public schools. 
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their similarity (as defined by the Us. Govern- 
ment's Dictionary of Occupational Titles (1965) 
with jobs found in the list. Thus, a job title not 
found in the list of Lesser et al. (1965) was given 
the same category number as other job titles ap- 
pearing with it in the Dictionary of Occupational 
Titles (1965), Thus, although the social class 
indices were traditional indices used by other in- 
vestigators, the content represented the unique 
social character of the Chinese in Chinatown, San 
Francisco, 

Exposure to English. Thirty variables related 
directly or indirectly to exposure to English were 
assessed by means of a parent interview. From 
these, four variables were selected, mainly by 
factor analysis, to represent the exposure to 
English domain (see Results section). 


Procedure 


The race of the tester and testee was matched in 
this study, as both were Chinese. 

A month was spent becoming acquainted and 
establishing rapport with all of the fourth graders 
in the Freedom School. During recess and physical 
education periods, the experimenter organized re- 
lays and other group games. In the classroom, he 
helped each student with his classwork. 

Following that month, the teachers of these 
students were asked, “Can (the student) carry on 
a normal conversation in English?” and “Do you 
consider his understanding of the English lan- 
guage adequate for his age?” Also, students were 
asked their birthplace. If the first two questions 
were answered affirmatively and if the student was 
US. born, he was later given the WISC (in Eng- 
lish). These three criteria Were used to provide 
Some assurance that any subject’s response would 
not be a function of his lack of understanding the 
English language. Birthplace was confirmed dur- 
ing the parent interview, From a total of 84 fourth 
graders, 53 satisfied the criteria. 

Following the administration and scoring of the 
WISC, each parent was interviewed. The interview 
schedule consisted of five sections (family struc- 
ture, school experience, bilingualism, how subjects 
spent their free time during weekdays and week- 
ends, and information relevant to the education, 
occupation, and residence categories), A conversa- 
tion with the parents prior to administering the 
interview schedule was held to decide whether to 
use the English or the Chinese version. 


RESULTS 


All families were intact, except for three 
where the father was deceased. The subjects 
had, on the average, .94 brothers, 1.2 sisters, 
and .21 older relatives (typically grand- 
parents) living with their family. Forty- 
one subjects had attended one of the two 
public elementary schools at the center of 
Chinatown (where over 90% of the students 
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are Orientals) prior to their present, enrol 
ment in the Freedom School. On the averag 
the subjects had attended 3.5 months g 
nursery school and 2.2 years of Chinese lan 
guage school. (These schools are in sessio 
daily after English school.) The averap 
percentage of English spoken at home varied 
from 75.6% (subject) to 28.3% (father) ti 
16.8% (mother). All the parents believed 
that bilingualism was important, but 34 fe 
that English should be given greater empha: 
sis than Chinese. 

Parents reported an average of 4.8 hour 
of free time available for the subjects on 
weekday and 11.6 hours during the weekend, 
Of those amounts, 2.9 hours were spent 
watching television during a weekday and 
5.1 hours during the weekend. The subjects 
went approximately 1.3, .17, and 2.3 times 
per month to a Chinese movie, to an Ameri- 
can movie, and to the library, respectively, 
Eight other exposure to English variables 
such as amount of free time spent reading 
English or Chinese books, listening to the | 
radio, ete. provided little response variabil- 7 
ity and were dropped from further analysis. 
On the social class variables, it was found 
that the average category was 2.2 for occu- 
pation (e.g. skilled worker), 3.0 for parent 
education (eleventh grade), and 3.5 for 
residence ($150 rent). 

Three exposure to English variables: (a) 
percentage of English spoken at home by the 
subject’s older brothers and sisters, (b) per- ^ 
centage of English spoken at home by sub- 
ject's younger brothers and sisters, and (c) 
percentage of English spoken at home by 
subject’s older relatives, were not considered 
in the analysis because not all subjects had 
brothers, sisters, or older relatives. However, 
the influence of the languages used by the 
siblings and older relatives was represen! 
by the variable, average percentage of Eng- 
lish spoken at home by the parents, siblings, 
and older relatives. 

Several factor analyses were performed to 
reduce the number of exposure to English 
and social class variables before correlating 
them with the WISC scores, The factor 
analysis which combined the remaining 19 
exposure to English variables with the three 
Social class variables is reported. All pri « 
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TABLE 1 
FACTOR Loapinas OF THE Exposure TO ENGLISH AND SociAL CLAss VARIABLES 
Factors 
Variables 
1 2 3 4 5 6 7 8 
EE eet OE) 
Sex of subject (0 = female, 1 = male) .14 | —.15 | —.52 | —.60 | —.07 | —.27 | —.26 19 
' Presence of father .11 | —.11 | —-01 | —.08 2m .07 .10 | —.05 
No. brothers .29. .08 .07 | —.19 .09 | —.02 | —.06 | —.72 
No. sisters .04 | —.00 .27 | —.10 | —.08 -83 07 14 
Percentage of English-speaking stu- 
dents in subject’s school .01 2 .85 | —.09 | —.01 .08 | —.07 | —.24 
Six months period in nursery school .03 .08 | —.10 .88 .03 | —.14 | —.10 16 
Years in Chinese school —.44 .02 | —.07 61 02 .10 61 12 
Perceritage of English spoken at home 
by subject 62 .00 | —.25 17 | —.30 | —.02 | —.27 | —.33 
Percentage of English spoken at home 
by father -86 .07 04 09 16 .08 05 | —.09 
Percentage of English spoken at home 
by mother E .04| —.11 | —.17 16 | —.11 | —.03 06 
Average percentage of English spoken 
at home by subject’s parents, siblings, 
and older relatives .91 02 .04 | —.08 | —.18 15| —.10 | —.17 
Parent’s emphasis of either Chinese (0) 
or English (2) or both (1) -.14 13 .24 | —.01 03 | —.1 .07 | —.76 
Hours of free time subject has during & 
weekday 12 50 10 | —.01 02 02 | —.66 | —.07 
Hours of free time subject has during a 
^ weekend .00 10 | —.03 09 | —.05 | —.04 | —.86 08 
Hours subject spent watching television 
during a weekday 4 68 | —.14 | —.17 .13 | —.12 | —.35 | —.21 
Hours subject spent watching television 
during a weekend —.10 82 | —.10 10 | —.02 | .02| —.05 02 
No. times in a month subject goes to 
ome movies —.12 08 | —.46 02) .21 66 09 i 
0. times in a month subject goes to 
eS movies a = .06 33| .10|—.20| —.50 | —.32 | .26| .20 
0. times in a month subject goes to the 
library pee —.01 .32 Bu .46 | —.06 .48 | —.30 .08 
. Occupation of head of household 63 15 .28 | —.18 34 | —.22 | —.04 18 
= Education of head of household 59 .08 .03 A3 48 | —.20 | —.02 -19 
Residence of head of household 7| 146 | —.01| —.02] .42|-n| |25 


Note. Italics indicate factor loadings of greater than .45. Factor 1 = Average percentage of Eng- 


lish Spoken at Home by Subject's Parents, Sibli 


Including Time Spent Watching TV; Factor 3 = 
struction Outside Elementary School. 


ciple components with latent roots greater 
than one were extracted and rotated accord- 
“y ing to the varimax criteria (Morrison, 1967, 
Pp. 229, 286). The latent roots greater than 
one were 4.26, 2.73, 1.92, 1.88, 1.45, 1.34, 1.14, 
and 1.07. The cumulative percentage of vari- 
ance accounted by these eight principal 
components was 72%. The rotated factors 
appear in Table 1. 
_Exposure to English variables with the 
j highest loading on each rotated factor were 
į Selected as a “cluster” to represent that fac- 


ngs, and Older Relatives; Factor 2 
Distance from Chinatown; Factor 4 


Free Time, 
Formal In- 


tor (see variables in italics in Table 1). It 
was decided that the 19 exposure to English 
variables would be represented by four 
clusters: Average Percentage of English 
Spoken at Home by Subject’s Parents, Sib- 
lings, and Older Relatives (AV %); Free 
Time, Including Time Spent Watching TV 
(FT); Distance from Chinatown (D- 
CHINA); and Formal Instruction Outside 
Elementary School (FOR-INST). Only four 
clusters were selected because factors beyond 
the fourth were either uninterpretable (Fac- 
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tor 6), overlapping with earlier factors (Fac- 
tor 7), or representative of very few sub- 
jects (Factors 5 and 8). Each cluster score 
was the simple sum of the exposure to Eng- 
lish variables loading above .45 on the cor- 
responding varimax rotated factor. 

In the social class domain, oceupation and 
education correlated .55; they correlated .42 
and 33, respectively, with residence. Hence, 
occupation and education (oceupation-edu- 
cation) were added together to form a single 
summary social class variable, while resi- 
dence was retained as a second social class 
variable. These two social class variables 
and the four exposures to English clusters 
were used thereafter in the analyses as in- 
dependent variables to relate to the depend- 
ent variables (12 subtests, six Verbal sub- 
tests, six Performance subtests, and three 
totals). 

The intercorrelations of the exposure to 
English and social class variables were 
generally low. The six intercorrelations of 
the four exposure to English variables were 
less than or equal to .15, except for FOR- 
INST with AV % (—.25) and with D- 
CHINA (—.17). Five of the nine correla- 
tions involving the social class variables 
were less than .15; only two were at all large, 
both involving occupation-education (.44 
with AV % and .43 with residence). Resi- 


TABLE 2 
bir qnd or THE COMBINED Exposure ro ENGLISH, COMBINED SOCIAL CLASS. 
AND WECHSLER INTELLIGENCE SCALE FoR CHILDREN (WISC) ToraLs 
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dence itself correlated .19 with AV % an 
—.22 with FT. 

The intercorrelations of the six Verh 
subtests ranged from zero (.01) to a moder 
ate high (.62), with Vocabulary showing the 
highest correlation to the other Verbal sub: 
tests (.41 to .62) except with Digit Span 
(—.13). Digit Span also showed essentially 
zero correlations with all other subtests. The 
intercorrelations of the Performance subtests: 
were less than .30 in all but one case; Block 
Design correlated .40 with Object Assembly, 
Coding and Mazes showed essentially zero 
correlations with all other subtests. The 
correlations of Verbal with Performance 
subtests were likewise low, with only six in} 
the .30 to .42 range, while Similarities and 
Block Design showed the highest correla-4 
tions. 

The correlations between the six inde- 
pendent variables and the WISC subtests 
were very low, with the highest between D- 
CHINA and Arithmetic at —.42; with” 
Information at —.31; and Vocabulary at 
—.31. A correlation of .32 was found between 
FOR-INST and Picture Arrangement, and 
FT with Comprehension, .34, and .36 with 
Mazes. | 

To test the significance of the relation be- 
tween the six independent variables and the | 
various subsets of the 12 WISC variables, | 


n Exposure to English Social class WISC 
1 2 3 4 s 6 7 8 i 
Exposure to English 
^ RR 1.00} .13| .1|—.25| .44| .19| .05| —.16 | —-10 
2 TMR 13| 1.00 | —.04| —10| — 01 | —22]| 26, .M| -2 
a RORA -15 | —.04 | 1.00 | —.17 | .0| .05|—.35| —.35 | -2 | 
. FOR- —35|—.10| —.17] 1.00] —113} —06| 15, 23, 23 
Social Class x 3 E u 1 
5. Occupation-Edueation 44 | —.01 09 n 
S : . B —.13 1 Y $ —.03 D 
6. Residence -19 | —.929 .05 | —.06 : n 1 B vi —.16 | —-06 
WISC £ 7 .00 .14 . 
7. Verbal Subtest Total .05 eo 7 
8. Performance Subtest Total | — .16 .14 m us a a d i 90 
9. Overall Total -4J0  .2|—4| 32| 00] —106] 1| 9| 10 


Note. AV% = Average Percent: 
and Older Relatives; FT = Free Time, 


from Chinatown; FOR-INST = Formal Instructi 


f English Spoken at Home by the Subject’s Parents, Siblin8® 
Including Time Spent Watching TV; D-CHINA = Distant 


on Outside Elemetary School. 


MENTAL ABILITIES AND SOCIAL CLASS 


' the procedure outlined by Morrison (1967, 


p. 212) ior testing the independence of K 
sets of variables was used. To answer the 
first hypothesis entailed the computing of 
three determinants: the intercorrelations of 
the 12 dependent variables, the six inde- 


. pendent variables, and all 18 variables. The 


result was nonsignificant (x? = 76.32, df = 
72, p < .05) ; no relation was demonstrated 
between the 12 WISC variables and the six 
independent variables. A similar test of the 
first part of the second hypothesis produced 
a similar result; no relation was demon- 
strated between the six independent vari- 
ables and the six Verbal subtests (x = 49.76, 
df = 36, p < .05). The latter part of the 
second hypothesis was rejected (x? = 78.72, 
df = 36, p < .05). Thus the predictions 
made in both parts of the second hypothesis 
were reversed by statistical findings. 

To examine the third hypothesis, three 
separate stepwise regression analyses were 
performed. All F values were nonsignificant. 
The six correlations between two social class 
and the three WISC totals were less than 
AT. Neither social elass measure correlated 
significantly with the three totals. Therefore, 
the first two parts of the third hypothesis 
were not supported, but the third part, that 
social class would not be significantly re- 
lated to the scores on the Performance sub- 
tests, was confirmed. 

Finally, to test the fourth hypothesis, 
three multiple regression analyses were per- 
formed. It was found that there was a signif- 
icant relation (p < .05) between the six 
independent variables and the Verbal total. 
These same six independent variables also 
significantly (p < .05) predicted the Per- 
formance total, and the Overall total (p < 
01). All three F values barely equal the 
significance value. The fourth hypothesis 
as stated depended on a relation between 
social class and WISC; but no such relation 
appeared. Moreover, the three correlations 
of AV % with the WISC totals were all less 
than 17. At most, it can be said that there 
was a relation between the other three expo- 
sure to English variables and the WISC 
totals. From stepwise regression analysis, 
one exposure to English variable (D- 
CHINA) correlated —.35 with the Verbal 
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total. When two variables were entered (FT 
and D-CHINA), the multiple correlation 
with Verbal total was .410. D-CHINA 
correlated —.35 with the Performance total, 
and —.42 with the Overall total. Including 
FOR-INST resulted in a multiple correla- 
tion of .47 with the Overall total. (All mul- 
tiple correlation coefficients cited were ad- 
justed for df.) 


Discussion 


The present study was not expected to 
reject, or at most, find only borderline ac- 
ceptance of all hypotheses. Previous studies 
have demonstrated significant differences in 
intellectual abilities between individuals 
from different social class levels (Burnes, 
1970; Eells et al., 1951; Lesser et al., 1965). 

Several explanations were considered but 
rejeeted through further analyses. It was 
noted earlier that because education and 
occupation correlated highly (.55), their sum 
was used as a single social class variable. 
Consideration was given that this procedure 
might have contributed to the nonsignificant 
results. However, for occupation, the highest, 
correlation with a WISC subtest was with 
Picture Arrangement (.12) ; the lowest was 
with Information (—.02). For education, 
the highest was again with Picture Arrange- 
ment (.15); and the lowest was with Arith- 
metic (.06). Thus, no information was lost 
in the process of combining the two vari- 
ables. 

Another possible explanation for the low 
correlations was sample homogeneity. To 
evaluate this possibility, all the subjects’ 
raw scores were transformed into IQ scores. 
The mean IQs were 98.6, 110.7, and 104.8 
for Verbal, Performance, and Overall scores, 
respectively. This is contrasted with the 
mean IQ of 100 established by Wechsler. The 
standard deviations were 17.6, 18.7, and 17.5 
for the Verbal, Performance, and Overall 
scores, respectively. These compare closely 
to the standard deviation Wechsler desig- 
nated for his test, 15. 

With these rejected explanations, there is 
still a number of remaining explanations. 
Referring again to homogeneous sample, it 
is possible, of course, that the sample is 
more homogeneous in social class or exposure 
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to English than the Chinese-American soci- 
ety as a whole, since all parents in the sam- 
ple chose to send their children to a Free- 
dom School rather than have them bused to 
public schools lacking a Chinese-American 
orientation. In addition, due to the small N, 
many of the factor loadings in the 50s and 
60s could be the result of chance fluctuation. 
To give some impression of the magnitude of 
uncertainty, consider that a .95 confidence 
interval for estimating a single correlation 
coefficient would allow an error of approxi- 
mately +.28. 

The low correlation between social class 
and WISC scores could be the product of 
inappropriate measures to identify different 
social classes. Possibly, the Residence scale 
did not adequately separate individuals 
along the social class dimension. Rents are 
high in the Chinatown area, but parents with 
limited English ability continue to live there 
because of the conveniences and security. 
Chinatown is a self-contained community. 
The food stores, department stores, banks, 
theatres, and parks are within walking dis- 
tance. Until recently, violence in the streets 
was extremely rare, 

The Occupation seale may also be inappro- 

_ priate because the status value of each oceu- 
pation was not derived from the values and 
opportunities existing for this particular sub- 
culture. For example, there were parents in 
the sample with high school and/or college 
education who were cooks, while there were 
other cooks who had only an elementary 
level education and owned their homes. Sim- 
ilarly, although the Education scale rep- 
resented an attempt to equate education in 
the San Francisco area to the education in 
Hong Kong, the classification criteria’ were 
still based on values foreign to the Chinese 
population, 

Finally, the results could be a true reflec- 
tion of the relation between social class and 
mental abilities in American-born Chinese, 
that is, there is little or no relation. Any 
potential differences might be masked by the 
parents’ attitudes toward learning or com- 
pensated by their emphasis on the impor- 
tance of bilingualism and biculturalism. This 
idea may also be used to explain the signif- 
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icant relation between the predictor vari- 
ables (exposure to English and social cla ss) 
and the Performance subtests. Since most 
the parents reported that they spoke very l 
little English at home, they are probably 
providing little direct opportunity to devel- 
oping competence in English. However, this 
lack is compensated by their stress on study- 
ing. Valuing study and achievement on tasks. 
set by adults, the child might do well on | 
tasks that are not as closely bound to the 
English language as are the Verbal subtests, 
This guess is given more credibility by the 
.33 correlation between Performance total 
and the exposure to English variable, FOR- | 
INST. 

The only other independent variable that." 
correlated as highly with the WISC scores 
was the exposure to English variable, D- 
CHINA, which correlated —.35, —.35, and 
—42 with the Verbal total, Performance 
total, and Overall total, respectively. D- 
CHINA was the sum of the value given to 
the number of times per month the subject: 
attended a Chinese movie plus the value 
given to the publie school attended. In com- 
puting this cluster seore, the public school 
variable was weighed two because its fac- 
tor loading was twice the Chinese movie 
variable. Remarkably, the correlation was 
negative, and thus opposite of what one 
might expect if exposure to English raised 
WISC scores. To explain the relation be- 
tween D-CHINA and the WISC scores, it 
may be hypothesized that the closer one is 
to the Chinatown area, the more likely he 
is to maintain some of the traditional values 
(to work and to study hard) of the Chinese. 

Returning to the FOR-INST variable, 
one might expect that the more exposure one 
has to educational institutions, the higher 
the test scores, However, this held only for 
the Performance and Overall totals, and not 
for the Verbal total. The FOR-INST vari- 
able consisted of three learning experiences: 
attendance at Chinese school, nursery school, 
and going to the library, and they correlated 
with the Performance total at 42, .28, and 
—06, respectively. Thus, it might be that 
the crucial factor was attendance at Chinese 
school. If so, this relation is in the opposite 
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direction to that predicted; emphasis on 
Chinese, not English, raises Performance 
total. 

The lack of relation between AV % and 
the Verbal, Performance, and Overall totals 
is difficult to explain, considering the impor- 
tance investigators (Bernstein, 1961; Hess & 
Shipman, 1965) have placed on the verbal 
exchanges between the parents and the child 
toward developing his learning skills, that is, 
what and how he learns. However, if one 
were to examine some of the English verbal 
exchanges between the parents and the sub- 
jects, one might describe the parents’ speech 
as “restricted” (Bernstein, 1961), that is, 
English speech characterized by simple 
sentences with few modifiers and lacking in 
precision of meaning. In addition, during the 
parent interview, the parents’ English speech 
tended to be more heavily accented than the 
children’s (from the experimenter’s impres- 
sions). It thus becomes more understandable 
that parents’ restricted English speech would 
not improve the subject’s performance on the 
WISC. 

The last combined exposure to English 
variable was FT. Here the correlations 
with the Verbal, Performance, and Overall 
totals were .26, .14, and .22, respectively, and 
were treated as indicating no relation. 


Future Prospects 


There is clearly a need to replicate this 
study with a larger, less select sample. This 
study only begins the examination of cul- 
tural and class differences as they are related 
to mental abilities of Chinese-Americans. In 
addition to replicating, future studies should 
compare and contrast the patterns and 
levels of abilities using different tests than 
those used in the present study, with several 
ethnie and social class groups, as in the 
Lesser et al. (1965) study. Such studies ne- 
cessitate a multicultural research staff. Not 
only must the testers but also the analyzers 
and interpreters of the result be sensitive to 
the subjects’ cultural uniqueness, as in this 
study. 

Shifting from a multicultural to a multi- 
class orientation, one can also begin to ex- 
Plore the differences in mental abilities 
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across social class levels and the interactions 
of class and cultural differences as they re- 
late to mental abilities, 

After the identification of these differ- 
ences, one can begin to examine possible 
sources of the differences. In this study of 
Chinese subjects, several indices of exposure 
to English were correlated with the Verbal, 
Performance, and Overall totals. These in- 
dices were composed to reflect the apparent 
factor-analytic unities in this ecology. 
Would similar composite indices emerge in 
other, larger studies? How could an exhaus- 
tive listing of cultural variables not found or 
explored in this study contribute signif- 
icantly and better represent cultural and 
class differences? Are variables responsible 
for cultural differences independent of vari- 
ables responsible for class differences? How 
should cultural definitions of class be in- 
troduced, if at all? How can the issue of 
genetic determination best be introduced? 
While the present study hardly helps to 
answer such questions, it does contradict 
the usual finding that higher social class and 
greater exposure to English lead to higher 
WISC scores in minority groups. Hence, it 
may help to refute an overly simplistic 
explanation of average group differences in 
WISC scores. 
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EFFECT OF HYPOTHESIS/TEST TRAINING 
ON READING SKILL’ 
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University of Minnesota 


A task analysis on the components underlying the hypothesis/test 
model of word recognition generated seven subskills. In Experiment 1, 
60 retarded children were randomly assigned to an experimental and 
control group. Experimental subjects were taught hypothesis/test 
word-recognition subskills. In Experiment 2, 40 normal children were 
randomly assigned to a 2 X 2 factorial design. The factors were train- 
ing on (a) hypothesis/test subskills and (b) recognizing high-frequency 
words flashed with a projector. Results of both experiments indicated 
hypothesis/test training produced significantly superior word recogni- 
tion and comprehension. Discussion centers on the need for developing 
subskills and strategies to the point where they are a unitary process 
that can be performed automatically without attention, so as to permit 


rapid comprehension. 


An important aspect of reading pedagogy 
is the discovery of skills that are important 
but not taught. At the present time, reading 
methods tend to focus on initial skills, and 
we need methods that train more sophisti- 
cated strategies. 

Possible reasons for the dearth of research 
on training intermediate and fluent reading 
processes include the following: Reading, 
as a covert process, is difficult to study; 
descriptions of the word-recognition process 
appear to be in conflict (Cattell, 1885; 
Gough, 1972; Kolers & Lewis, 1972); and 
until recently we did not have explicit, test- 
able models of recognition and information 
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processing (Williams, 1971). The recent 
advances in models of recognition, reading, 
and information processing (Davis, 1971; 
Singer & Ruddell, 1970) have been most 
useful in suggesting explicit pedagogical 
approaches for teaching more sophisticated 
reading strategies than has heretofore been 
the case. 

The research reported here operationalizes 
a method for teaching intermediate skills 
based on a partial model of word recognition 
(Samuels, 1970). The partial model of word 
recognition is not an entirely new idea. In 
many ways it is similar to the Halle and 
Stevens (1964) analysis by synthesis model 
of speech recognition and to Bruner's (1951) 
hypothesis testing and Solley and Murphy's 
(1960) trial and check. Ryan and Semmel 
(1969) have also written about the word- 
recognition process in which they state, 
“Bxpectancies about syntax and semantics 
within context lead the reader to form hy- 
potheses which can be confirmed or not 
confirmed with only a small portion of the 
cues available in the text [p. 59]." What 
these models appear to acknowledge is that 
recognition is a constructive process in which 
output is different from and greater than the 
input. Implieations of these models for 
reading would be that, given context and 
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but a single letter from a target word, the 
reader ought to be able to generate the 
entire word from this minimal, visual-letter 
cue. 

According to the partial model of word 
recognition, four processing stages are in- 
volved in recognizing a target word. In the 
first stage the words preceding the target 
word are read for meaning. This information 
is used in the second stage to generate one 
or more hypotheses as to the identity of the 
target word. In the third stage, visual in- 
formation consisting of one or more letters 
is picked up from the target word and tested 
against the predicted word. In the final 
stage, the hypothesis is accepted or rejected 
depending on whether the word fragments 
perceived match the expected word. Speed 
of recognition is determined partly by the 
amount of visual information from the tar- 
get word necessary for verifying a predic- 
tion. The less visual information required, 
the faster is the recognition. 

Predictions based on this model suggest 
that skilled readers have better word recog- 
nition because of superior processing strat- 
egies at each of the four stages described 
above. Tests of this hypothesis comparing 
adults and children (Samuels & Chen, 1972) 
and good readers versus poor readers at the 
elementary school level (Begy & Samuels, 
in press) indicated that the better readers 
were more accurate in generating the target 
word when given context; they required 
less visual information from a target in 
order to identify the word; they were better 
able to identify letters from the target word 
which could serve as a cue to recognition 
(in the absence of total recognition) ; and 
they were more willing to alter an incorrect 
identification of a target word. 

While this partial model of word recog- 
nition has proven useful, it has a major 
problem. The amount of time necessary to 
generate a prediction is in the neighborhood 
of 200 milliseconds (Posner & Boies, 1971). 
Since it takes about 250 milliseconds or less 
to recognize a word in isolation, this model 
does not account for the high-speed recog- 
nition responses of fluent readers reading 
meaningful material (Cosky & Gough, 
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1973). While the hypothesis/test procedure 
is too slow for what goes on in fluent read. 

ing, the model seems to account for inter- 
mediate levels of reading skill. For the high- 
speed recognition responses of fluent readers, 
the model requires further refinement. 

Other researchers have tried to develop 
new reading methods based on the use of 
context for predicting a target word. Many 
of these experimental methods used the 
cloze technique but failed to find improve- 
ment in reading as a result of this practice 
(Bloomer et al., 1966; Blumenfield & Mil- 
ler, 1966; Friedman, 1964; Guice, 1969; 
Schneyer, 1965). It should be noted, how- 
ever, that these studies did not provide 
training on the subskills necessary for suc- 
cessful cloze performance and the failure to 
find differences may have been due to the 
students’ inability to perform and profit 
from cloze-type exercises. Kennedy and 
Weener (1973), however, did find improved 
comprehension as a result of training on 
cloze passages even though they did not 
provide instruction in cloze subskills. 

For the study described in this report, à 
task analysis was done on each stage of the 
partial model of word recognition. Instruc- 
tional sequences were then designed to teach 
the subskills identified by the task analysis. 
For example, methods were designed for in- 
structing children how to predict from con- 
text and how to use minimal visual informa- 
tion in verifying a prediction. Pilot work 
and models of speech recognition (Halle & 
Stevens, 1964) suggested that these sub- 
skills were already mastered in the auditory- 
speech mode. Consequently, instruction was 
Sequenced to start with skills in the audi- 
tory-speech mode and to progress to the 
skills required for recognizing printed words. 

The purpose of the two studies reported 
here was to test the extent to which train- 
ing on the subskills underlying the partial 
model of word recognition would lead to 
improved reading. One of the dependent 
variables investigated was accuracy of word 
recognition. Another was speed of word rec- 
ognition, since the model suggests that using 
minimal visual information should lead to 
increased word-recognition speed. The third 
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dependent variable was comprehension as 
measured by a modified cloze. Bormuth's 
(1966) research indicates that cloze tests 
are measures of reading comprehension. To 
enable subjects to use the hypothesis/test 
strategy of recognition, the subskills identi- 
fied by the task analysis were taught to the 
experimental subjects. In Experiment 1, 
mentally retarded subjects were used. Since 
failure to find differences in this experiment 
may have been due to failure to adequately 
teach the subskills, tests were given to de- 
termine if the subjects had learned the sub- 
skills. In Experiment 2 a replication and 
extension were done using normal subjects. 
In addition to hypothesis/test training, the 
second experimental factor was training sub- 
jects to recognize high-frequency words 
flashed with a projector. McGinitie (1973) 
stated that if one knows the 1,000 high-fre- 
quency words in English, one can identify 
three fourths of the words used in school 
books. Consequently, to aid the use of con- 
text, some students in the second study were 
given extensive training with these basic 
words. Shankweiler and Liberman (1972) 
found that the ability to recognize words in 
isolation is highly related and fundamental 
to the ability to read in context. 


EXPERIMENT 1 
Method 


Subjects 


Sixty mentally retarded children were randomly 
assigned to an experimental and control group, 
30 to a group. The mean IQ for the retarded sub- 
jects was 72 and the mean age was 10.3 years. Ex- 


perimental subjects were trained on the subskills , 


underlying the hypothesis/test strategy of word 
recognition. Control subjects were given regular 
reading instruction. Instructional time was care- 
fully controlled for both groups. Training for both 
groups lasted 14 weeks, approximately four hours 
a week. Experimental subjects were taught by 
three undergraduate assistants. Each assistant 
worked with 10 retarded subjects. The undergrad- 
uates had no previous teaching experience nor did 
they have a background in reading pedagogy. 

week the assistants were given 10-15-minute ex- 
Planations of how to teach the subskills. Control 
subjects were taught by three experienced certifi- 
cated teachers with training in special education. 
Each control teacher worked with 10 subjects. To 
control possible Hawthorne effects, a graduate re- 
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search assistant, with experience in teaching read- 
ing, gave each of the control subgroups an addi- 
tional hour of reading instruction each week. The 
assistant had the control subjects read from their 
reading books and their comprehension was 
checked. 


Design 
'The design was a simple randomized design. 


Procedure (Independent Variable) 


Hypothesis/test. The following subskills and 
methods of instruction were derived from a task 
analysis of the model of word recognition. (For 
examples of materials and methods used to teach 
these subskills see Archwamety & Samuels, 1973; 
Appendix A, Dahl, Samuels, & Archwamety, 1973.) 
"The seven component skills derived from the model 
are as follows: 

1. Training on the ability to say & word given 
an initial sound. The subject was given drills of 
the following nature: 


The experimenter says, “Tell me a 
word starting with the sound /p/.” 
Response: The subject gives a word starting 
with that sound. Any word starting 
with the sound is acceptable. 


Stimulus: 


2. Training on the ability to determine the be- 
ginning letter of a spoken word. The subject was 
given drills of the following nature: 


Stimulus: The experimenter asks, “What is the 
first letter in the word girl?” 

Response: The subject gives the name of the 
initial letter in girl, that is, g. 


3. Training on the ability to recognize visually 
the initial letter of a word presented orally. The 
subject was given drill of the following nature: 


Stimulus: The experimenter says, “What is the 
first letter in the word boy?” Then 
the experimenter shows a card with 
the letters b, c, t, r printed on it. 

Response: The subject points to the letter b. 


4. Training on the ability to use auditory con- 
text to predict words that could logically follow. 
The subject was given & drill of the following na- 
ture: 

Stimulus; The experimenter says, *My mother 
sleeps on her —.” 

Response: The subject predicts the missing 
word. Any word that makes sense is 
acceptable. 


5. Training on the ability to use auditory con- 
text to predict word(s) that could logically follow 
in a sentence hearing just the initial sound of the 
word. The subject was given drill of the following 


nature: 


838 


Stimulus: The experimenter says, “The cat ran 
after the /m/__.” ie 
The subject predicts what the missing 
word might be. The words must be- 
gin with the /m/ sound and make 
sense in the context. 


Response; 


6, Training on the ability to use visual context 
to predict word(s) that would logically follow in 
a sentence without seeing the initial letter of the 
word, The subject was given drill of the following 
nature: 


Stimulus: The experimenter shows the follow- 
ing in printed form: The children 
open the __. 

The subject is asked to read and 
predict the word in the blank. The 
experimenter tells the subject what- 
ever word the subject cannot read in 
context. Any word that makes sense 
is acceptable. 


Response: 


7. Training on the ability to use visual context 
to predict word(s) that could logically follow in a 
sentence when given the initial letter of the target 
word, The subject was given drill of the following 
nature; 


Stimulus: The experimenter shows the follow- 
ing printed form: The girl ate the 


The subject is asked to read this and 
predict the word in the blank, Any 
word beginning with b that makes 
sense is acceptable. 


Response: 


The assistants were told that accuracy in re- 
sponse was desirable but not sufficient. What was 
desired was accuracy and speed of response. These 
goals of accuracy and speed were emphasized dur- 
ing instructional sessions with the experimental 
subjects. 


TABLE 1 
Wonps TACHISTOCOPICALLY PRESENTED WITH 
EacH EXPERIMENTAL CONDITION 


Condition Word: Sı Target word; Sz 

1 — pard 
— rabbi 

2 ic camp 
— fifty 

3 cold snow 
green grass 

4 blue Ocean 
beautiful song 

5 Lemon has a salty taste 
We heard a loud noise 
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Procedure (Dependent Variables) 


Speed of tachistoscopic word recognition, The 
two groups of subjects (experimental and control) 
were tested on speed of word recognition, using à 
two-channel, scientifie prototype tachistoscope to 
present stimuli. Each subject was tested under five 
conditions. The five conditions and the words used. 
in the conditions may be seen in Table 1. Under 
each of the conditions there were two target words 
represented by S; in Table 1. 

Description of five conditions. The target words 
of Condition 1 were low-frequency words that oc- 
cur less than six times in one million words (Thom- 
dike & Lorge, 1944). The target words in other 
conditions were higher frequency words, but their 
frequency values were controlled to be approxi- 
mately equal to one another, about 400-600 oc- 
currences per million words. These frequency val- 
ues are averages of the Thorndike-Lorge word- 
frequency count. 

Words in Condition 3 and Condition 4 were se- 
lected such that the associative values between 


them in Condition 3 were high (47 between cold 


and snow, 113 between green and grass) and those 
between Sis and S;s in Condition 4 were low (0 
between blue and ocean, 0 between beautiful and. 
song). These associative values were taken from 
Palermo and Jenkin's (1964) word association 
norms. 

Administration of test. Each subject was given 
warm-up training on the tachistoscope to familiar- 
ize him with the task. The warm-up tasks were 
analogous to the conditions found in the test 
proper, but different words were used. 

The five conditions were randomly adminis- 
tered. The method of ascending limits was used in 
determining visual recognition threshold. That is, 
the first time the word appeared the exposure time 
was very short. With each subsequent presentation 
of the word the exposure time was increased until 
the subject gave two correct recognition responses. 
For purposes of analysis, the average visual dura- 
tiog threshold of the two correct responses was 
ui 

As soon as the subject read the first S, word of 
a pair, it was terminated and then for 20 milli- 
seconds the target word was flashed. The subject 
was then to identify that word. If he could not, the 
word pair was presented again and again, with the 
exposure duration of the target word increasing 10 
milliseconds each time. An experimental event Was 
completed as soon as the subject could identify the 
target twice. 

Word identification in contezt—accuracy of rec- 
ognition. Materials consisted of two sets of 5 X 
8 inch index cards, with 10 cards in each set. On each 
card a single sentence was typed using an Under- 
wood primary typewriter. The last word in the 
sentence was underlined in red and was considere 
to be the target word while all words preceding the 
target word were considered context. Sentences iD 
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the first set were such that the context was so 
compelling that the target word could hardly be 
any other. Àn example is, “It is dark at night.” 
In the second set the context was less compelling 
as in, for example, "There are fish in the lake." 

The procedure consisted of showing the child 
the sentence and having the child read the sen- 
tence to the experimenter. Help was given on con- 
text words but no help was given on the target 
word. 

Modified cloze test of comprehension. A 122- 
word passage with 20 deleted words was shown to 
the student. Each deletion contained the first let- 
ter of the target word, as exemplified in, "The 
boys r— their bicycles back h_—.” Help was 
given to the students on context words, but none 
was given on the target. Only exact replacements 
were counted as correct. 


RESULTS 


Seven Component Skills of Hypothesis/Test 
Procedure 


A pretest was given on each of the sub- 
skills. Seven t tests were run comparing ex- 
perimental and control groups on each of 
these component skills. Significant differ- 
ences were not found between the two groups 
on any of the seven measures on either num- 
ber of correct responses or latency of re- 
sponse. 

At the end of the training period ¢ tests 
were again computed for each of the com- 
ponent subskills. The experimental group 
was significantly superior (p < .05 versus 
P < 01) on accuracy criteria on six of the 
seven component skills. The only skill where 
no difference was found between the two 
groups was Subskill 4 as described in the 
method section. When one compares the 
latency of a correct response, the experi- 
mentals were faster (p < .05 versus p « 
01) on five of the seven measures. Those 
subskills failing to reach significance were 
Skills 1 and 5. 

With regard to the question of could these 
subskills be taught, the overall superiority 
of the experimental group on accuracy and 
lateney criteria indieates that the method 
used to teach the skills was effective. 


Speed of Tachistoscopic Word. 
Recognition 


A prerequisite for inclusion in the tachis- 
toscopie study was the ability to read all 


839 


TABLE 2 
SPEED or Tacuistoscopic Worp RECOGNITION 
or Hicu-FnEQUENCY (HF) AND Low- 
Frequency (LF) Worps PRESENTED 
WITH AND WITHOUT CONTEXT 


Experimental group| Control group 
Condition 
28 SD 2 SD 
Isolation (LF) 34.25 | 11.54 | 66.00 | 64.74 
Isolation (HF) 33.50 | 15.50 | 78.25 | 99.20 
One-word context 
(H) 47.50 | 73.70 |118.50 |123.65 
One-word context 
(L) 81.50 | 89.92 |117.25 |163.99 
Sentential con- 
text 45.50 | 56.61 |105.25 |187.82 
Combined 48.45 | 43.08 | 97.05 | 84.03 


the words shown with the machine. Only 
20 subjects from the experimental and 20 
from the control met this requirement. The 
data for the speed of word recognition were 
analyzed with a 2 X 5 analysis of variance 
with repeated measures on word-presenta- 
tion conditions (see Table 2). 

The between-subject source of variance 
on the analysis of variance indicated that 
the experimental group recognized the 
flashed words significantly faster than the 
control group (F = 5.03, df 1/38, p < 05). 
The within-subject source of variance on 
the analysis of variance indicated that there 
was no difference among the five word- 
presentation conditions (F = 1.92, df 4/152, 
ns). The interaction effect was not signifi- 
cant (F < 1). 


Word Identification in Context 


Table 3 shows the mean number of correct 
responses given to the target words when 
they were preceded by either a compelling 
or ambiguous context. The 2 X 2 analysis of 
variance with repeated measures on one of 
the factors indicated that overall (i.e., com- 
pelling and ambiguous context combined) 
the experimental group identified target 
words more accurately than did the controls 
(F = 4.92, df 1/58, p < 05). The analysis 
also indicated that identification was more 
accurate under compelling context than 
under ambiguous context (F = 46.79, df 
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TABLE 3 


Summary Scores on WORD IDENTIFICATION 
IN CoNTEXT 


group 
Condition 


x SD x SD 


1, Compelling context | 5.47 | 2.57 


5.43 | 2.03 
2, Ambiguous context | 4.57 | 2.30 | 2.40 | 1. 
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1/58, p « .01). There was also a group by 
condition interaction (F — 13.76, df 1/58, 
P < O01). Consequently a test of simple 
effects was done that indicated no signifi- 
cant difference between experimental and 
controls under compelling context. However, 
when the target word was preceded by am- 
biguous context the hypothesis/test group 
was significantly superior at identifying 
the word (F = 14.30, df 1/116, p < 01). 


Modified Cloze Test of Comprehension 


The mean number of correct words given 
by the experimental group was 15.10 (SD = 
2.60) and the mean number for the control 
group was 10.67 (SD = 431). The ¢ test 
indicated superior performance for the ex- 
perimental group (t — 4.74, df 58,p < 01). 


EXPERIMENT 2 
Method 


Subjects 


The third-grade student body in a middle-class, 
suburban elementary school was given the Iowa 
Tests of Basic Skills and 36 of the poorest readers 
were selected as subjects. They were randomly as- 
signed to one of four groups. The mean IQ for the 
four groups was 107.00, and the mean reading 
achievement score was 3.05. 


Design 
A 2 X 2 factorial design was used. The factors 


were hypothesis/test iraining and flashed word 
recognition training. 


Procedure (Independent Variables) 


Hypothesis/test training. The same subskills and 
methodology were used as described in Experiment 
1, method section. 

Flashed word recognition, The subjects who Te- 
ceived training recognizing flashed words were 
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given practice in word recognition beyond ag 
curacy. Practice was accomplished by flashing 
words with a carousel projector at a rate controlled 
by a timer. The words were first exposed at a visual 
duration threshold of 3 seconds and advanced to 
shorter exposures (2 seconds, 1.5 seconds, 1 second, 
.5 second). Change to faster rates was determined. 
by going through all 80 words in a tray perfectly 
three times. When a subject was 100% accurate 
at the fastest rate, the subject selected a new tray 
of slides. The 1,600 different words in the carousel 
trays were selected from the Dale-Chall list of 
3,000 most common words. 

Teachers and time. The teacher who taught the 
experimental groups had six years of teaching 
experience; however, she had only 1 year of exe 
perience at the grade level. The control teacher had 
19 years of teaching experience about equally 
divided between second and third grades. The 
amount of time devoted to reading was the same 
for experimental and control groups. The experi- 
ment was run for seven months, at the end of 
which time the tests were given. 


Procedure (Dependent Variables) 


Tachistoscopic word recognition. The exact pro- 
cedure, words, and equipment were used in Experi- 
ment 2 as were described in Experiment 1. 

Modified cloze test of comprehension. A 610- 
word passage with 62 deletions was shown to the 
students. Each deletion contained the first letter 
of the target word. As in the previous study, help 
was given on context words but not with target. 
Only exact replacements were counted. 


RESULTS 


To determine the initial comparability 
of the four randomly assigned groups, tests 
were run on several variables. No difference 
was found on The Iowa Tests of Basic 
Skills—Reading Comprehension; the Cog- 
nitive Abilities Intelligence Test; and a 
test of tachistoscopie word recognition. Al- 
though no significant differences were found, 
the control group that received no hypoth- 
esis/test training had faster tachistoscopi¢ 
word-recognition scores on four of the five 
treatments. 


Tachistoscopic Word Recognition 


Means on speed of word recognition for 
the four groups can be found in Table 4. j 

Speed of tachistoscopic word recognition 
at the end of training was tested using ê 
three-factor analysis with repeated mea- 
Sures on one of the factors. The two be- 
tween-subject factors were hypothesis/test 
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versus control and carousel word recognition 
versus control. The within-subject factor 
consisted of the five conditions of tachis- 
toscopie word recognition as shown in Table 
il; 

The analysis of variance on speed of word 
recognition used gain scores. It indicated on 
the between-subject source of variation that 
the experimental hypothesis/test trained 
group was significantly faster than its con- 
trol (F = 9.35, df 1/32, p < .01) but there 
was no difference between the group trained 
in carousel word recognition and its control 
(F < 1, df 1/32). On the within-subject 
source of variation there were significant 
differences in speed of recognition on the 


TABLE 4 


Spreen or RECOGNITION IN MILLISECONDS ON 
TacuisToscopic TEST 


Condition 
Treatment 
s e 3° a 1*9 

Pre 

H 105 | 91 | 115 | 115 110 

H 85 | 85 | 115 | 107 121 
Post 

H 53 | 6l 88 90 140 

H 55 | 73 | 106 | 108 175 
Change 

H 52 | 30 27 25 | —30 

H 30 | 12 9 | -1 | —54 
Pre 

A 105 | 93 | 128 | 111 124 

A 85 | s4 | 100 | Hl | 107 
Post 

A 63 | 74 | 103 | 109 174 

A 45 | 59 | 90 | 90 | 14 
Change 

A 42 | 19 25 2 | —50 

A 40 | 25 | 11] 21 | -& 


Note, H = Hypothesis/test training; H = no 
hypothesis/test training; A = flashed word- 
Tecognition training; A = no flashed word-recog- 
nition training. 

* Isolated word, low frequency. 

* Isolated word, high frequency. 

* One word context, low association. 

3 One word context, high association. 

* Sentence context. 
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TABLE 5. 


Means AND STANDARD DEVIATIONS ON CLozE 
Test ERRORS 


Treatment z SD N 
HA 11.44 3.98 9 
HA 9.33 2.58 9 
HA 16.22 4.76 9 
HA 16.00 5.14 9 
H 10.39 3.51 18 
H 16.11 4.95 18 
A 13.83 4.99 18 
A 12.67 5.26 18 


Note. H = Hypothesis/test training; H= no 
hypothesis/test training; A = flashed word-recog- 
nition training; A = no flashed word-recognition 
training. 


five repeated measures conditions of word 
recognition (F = 24.77, df 4/128, p < 01). 
None of the interactions on either the be- 
tween- or within-subject variations were 
significant. 


Modified Cloze Test of Comprehension 


Means and standard deviations on the 
cloze test errors can be found in Table 5. 

The two-way analysis of variance indi- 
cated that the hypothesis/test trained 
groups made significantly fewer errors than 
the controls (F = 14.65, df 1/32, p < .01). 
On the second factor, no difference was 
found between the groups given carousel 
word-recognition training and the controls 
(F « 1, df 1/32). The interaction effect was 
not significant. 


DISCUSSION 


A partial model of word recognition was 
transformed by means of a task analysis 
into a series of subskills. This model uses 
hypothesis/test procedures for word recog- 
nition. The purpose of the two studies re- 
ported in this investigation was to deter- 
mine if children who were trained in these 
hypothesis/test subskills would be superior 


842 


in word recognition and comprehension. An 
additional question investigated in Experi- 
ment 1 was whether the subskills could be 
taught effectively. 

Results of Experiment 1 indicated that 
the subskills could be taught effectively, 
in that the mentally retarded children who 
were in the experimental group were gen- 
erally superior to their controls in both 
accuracy and response latency on the tests 
of subskill mastery. Of greater concern was 
the performance of the experimentally 
trained subjects on the two tests of word 
recognition. One test consisted of recogniz- 
ing words presented with a tachistoscope. 
On this test the experimentally trained 
group was significantly faster in speed of 
word recognition. The second test of word 
recognition had two parts. In the first part 
the target word was preceded by a compel- 
ling context, whereas in the second part the 
target word was preceded by ambiguous 
context. The analysis indicated no difference 
between experimental and control groups on 
the compelling context but significantly 
superior performance by the experimental 
group on the ambiguous context, where 
preceding context did not readily elicit the 
target word. The cloze test of comprehension 
indicated the superiority of the hypothesis/ 
test trained experimental subjects. 

Results of Experiment 2, which used a 
normal population, were much the same. On 
the test of tachistoscopie word recognition, 
the experimentally trained subjects were 
significantly faster, when one compared 
change in performance from pretests to post- 
tests. The cloze test of comprehension in- 
dicated the superiority of the experimental 
subjects. 

It should be pointed out that in Experi- 
ment 2 flashed word recognition was added 
to the design as a second factor. The ration- 
ale for including this factor is that in order 
to be able to use context as an aid in recog- 
nizing a target word, the reader must be 
able to recognize the words in context ac- 
curately and quickly. Training on this vari- 
able consisted of recognizing high-frequency 
words that were flashed with a projector. 
Comparing experimental and control groups 
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on this factor indicated no difference py 
tween them on tests of word recognition or 
comprehension. 

Previous portions of this article outlined 
the rationale for teaching the hypothesis/ 
test word-recognition skill and the com- 
ponent subskills. At this point it is necessary. 
to mention that the goal in teaching this | 
word-recognition strategy is for each of | 
the subskills to be integrated into a higher- 
order, wholistic process. Hilgard and Mar- 
quis (1961) wrote: 


most learning is complex and requires the simul- 
taneous learning of several components. Possibly 
the clearest illustration of this is the appearance 
during learning of plateaus (Bryan & Harter, 1897) 
during which practice does not lead to improve- 
ment. In complex learning, plateaus often seem 
to be temporary periods devoted to the organiza- 
tion of component habits into larger units [p. 
127].... 


There are numerous psychomotor skills 
that, through extensive training, appear to 
be a single skill. The tennis serve, for ex- 
ample, appears to be one fluid motion, when, 
in fact, in teaching the skill the coach breaks 
it into its components. Recent research by 
Guthrie (1973) in the area of reading in- 
dieates that for good readers, reading is à 
unitary process. However, for poor readers 
the process is not unitary and consists of à 
number of independent subskills. Guthrie's 
research implies that in the process of be- 
coming a skilled reader, the subskills must 
be integrated in such a way that they be- 
come unified into a single process that we 
call “reading.” 

As necessary as the integration of these 
subskills appears to be, there is still an 
additional requirement for the development 
of a good reader, namely, that the skills used 
in reading be run off with no attention. 
When attention is required for the execution 
of a skill, immediate access of meaning 18 
Prevented (LaBerge & Samuels, 1974). Flu- 
ent readers are characterized by their ability 
to perform the necessary decoding auto- 
matically—without attention—so that their 
limited attention and memory capacity 1$ 
left free to process meaning. At the present 
time, the psychology of reading has dealt 
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with the problems inherent in developing 
aceuracy in word recognition as well as the 
delineation of some of the subskills involved 
in the reading process. The field has not yet 
dealt sufficiently with the problem of effi- 
cient training strategies for going beyond 
accuracy to automaticity, so that Huey’s 
(1908) advice about how to develop auto- 
maticity is still the level at which we are 
operating today: “repetition progressively 
frees the mind from attention to details, 
makes facile the total act, shortens the 
time, and reduces the extent to which con- 
sciousness must concern itself with the 
process [p. 104].” 

In order for the subskills to be executed as 
a wholistic process without the need for 
attention, an extended amount of practice 
beyond mere accuracy is required. 
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SELF-DETERMINATION OF ACADEMIC STANDARDS 


BY CHILDREN: 
TOWARD FREEDOM FROM EXTERNAL CONTROL! 


JEFFREY J. FELIXBROD ax» K. DANIEL O'LEARY’ 
State University of New York at Stony Brook 


The effects of contingent reinforcement, with performance standards 
self-determined or externally imposed, upon children’s productivity 
in a naturalistic setting were examined. Children in one contingent 
reinforcement condition self-determined their academic performance 
standards, while the same requirements were externally imposed upon 
yoked children in a second contingent reinforcement condition. Those 
in a no-reinforcement control condition performed in the absence of 
external reward. In a reinforcement phase, children in the self-deter- 
mination condition were significantly more productive than control 
subjects and performed as well as those for whom academic stan- 
dards were externally imposed; however, when reinforcement con- 
tingencies were later absent for all groups, there was a tendency, not 
statistically significant, for children in the self-determination condition 
to be less productive than those who had experienced externally im- 


posed standards. 


There have been countless demonstrations 
that children's behavioral productivity can 
be increased through contingencies adminis- 
tered and controlled by an outside agent. 
This line of investigation recently has been 
cross-fertilized by studies of behavioral self- 
control. A number of experiments have dis- 
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closed that behavioral self-management 
may serve as a potential effective alterna- 
tive to external control .(e.g., Bandura & 
Perloff, 1967; Bolstad & Johnson, 1972; 
Drabman, Spitalnik, & O'Leary, 1973; 
Kaufman & O’Leary, 1972). However, al- 
though these studies have examined compo- 
nents of behavioral self-regulation such as 
self-instruction, self-monitoring, and self- 
administration of reinforcement, little re- 
search has focused upon a comparison of the 
effects of contingent reinforcement in situa- 
tions in which performance standards (i.e., 
behavioral requirements for differential 
levels of reward) are self-determined rather 
than externally imposed. The few investiga- 
tions which have examined the effects of 
contingent reinforcement under conditions 
of self-determined and externally imposed 
performance standards have generated con- 
tradietory findings. Lovitt and Curtiss 
(1969) found that the academic response 
rate of a 12-year-old boy was higher when 
he, rather than the teacher, selected his 
educational requirements. On the other 
hand, Glynn (1970) observed no difference 
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in academie performance as a function of 
student-determined and experimenter-deter- 
mined reinforcement. 

In a study by Felixbrod and O'Leary 
(1973), elementary school children in one 
contingent reinforcement condition self- 
determined their academic performance re- 
quirements for earning points which could 
later be exchanged for prizes. In a second 
contingent reinforcement: condition, the 
same standards were externally imposed; 
these children were individually yoked to 
subjects in the first condition so that the 
Same performance standards were em- 
ployed. The two reinforcement treatments 
were found to be equally effective; both 
groups were observed to be more productive 
than was a no-reinforcement control group. 

The significance of such findings is not 
fully understood, though, until the effects of 
withdrawing such contingencies, as well as 
introducing them, are known. Thus, the 
present experiment was designed primarily 
to compare the relative effects of contingent 
reinforcement, with amount self-determined 
or externally imposed, on the resistance to 
extinetion of behavioral productivity fol. 
lowing the withdrawal of all material in- 
centives, Embodied in the design was a con- 
structive replication of Felixbrod and 
O'Leary (1973). 

It is likely, of course, that some loss of 
productivity will follow the withdrawal of 
self-determined or externally determined 
contingencies; critical questions include 
both the magnitude and rate of this decre- 
ment, relative both to a continually per- 
forming no-incentive contro] group and to 
one another. Compared to the performance 
of the self-determination condition in ex- 
tinction, children in the external imposition 
condition conceivably might be more, less 
or equally productive, : 4 


Metxop 
Subjects 


The subjects were 24 white children, 


12 girls, between eight and nine years of age, drawn 
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Setting and Material 


The experiment was conducted in separa’ 
rooms provided by the school, where subjects were 
tested individually, At the beginning of every ses. 
sion, the child was given a packet of 12 sheet 
(8% X 11 inches), each containing 20 grade-level- 
appropriate arithmetic problems. All subjects were 
presented with the same arithmetic questions in 
the same order during any one session, Difficulty 
level of the problems was constant across sessions, ' 


Procedure 


A 3 X 2 X 5 (Treatments x Experimental: 
Phases X Sessions) factorial design was employed; 
Experimental phases consisted of reinforcement 
and extinction; there were five sessions within 
each experimental phase. Sessions, each of which | 
lasted up to 20 minutes, were conducted every” 
Monday, Wednesday, and Friday, with the excep. 
tions of school holidays and pupil absences. In? 
order to prevent potential confounding due to dif- 
ferential absence from classroom activities, sub- 
jects in all three experimental conditions were 
tested during the same instructional periods. 

Three experimental treatment conditions cone | 
sisting of 8 subjects each were formed: (a) a con- 
tingent reinforcement condition in which perfor- 
mance standards were self-determined; (b) a 
contingent reinforcement condition in which per- 
formance standards were ezternally imposed; and 
(c) a no-reinforcement control condition. The 24 
children were divided into eight sets of 3 subjects, 
each set matched on the basis of sex and score level 
on two 20-minute arithmetic pretests. The 3 chil- 
dren within each set were randomly assigned to dif- 
fering experimental conditions. Two sets of sub- 
jects then were randomly but permanently as- 
signed to each of four white female undergraduate 
experimenters. : 

Reinforcement phase. The following instructions 
were read to subjects in the self-determination 
condition at the beginning of each of the five ses- 
sions: 


1. When people work on a job, they get paid 
for what they do. I am going to pay you points 
Which you can use to buy these prizes (pointing 
to prizes and point-exchange values). Your job 
is to answer these arithmetic questions. Answer 
the questions in order. In order to earn the 
Points, only correct answers will count (re- 
peated). You will have 20 minutes to do these. 
But you can stop before 20 minutes are up } 
you want to. Just come outside the door where 
TIl be waiting, if you want to stop sooner. 

I am going to let you decide how many 
points you want to get paid for each right an- 
swer. Take a look at the numbers on the next 
page (pointing to a separate page on which n 
Subject is to chose a performance standard). 
want you to decide how many points you want 
to get paid for each right answer. (Experimenter 
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points to each possible choice in a list of 10 pos- 
sible performance standards: ^I want to get paid 
1 point; 2 points;...10 points for each right. 
answer") After I leave the room, draw a circle 
around the number of points you want to get 
paid for each right answer. 

3. Remember, you can stop and leave when 
you want to, or else I'll tell you when the 20 
minutes are up. I will be right outside. (Instruc- 
tions repeated if necessary.) 


At the start of Sessions 2 through 5 of the rein- 
forcement phase, the experimenter again read the 
instructions. During these sessions, subjects in the 
self-determination condition were given the oppor- 
tunity to choose the same or a different perform- 
ance standard. 

Instructions given to children in the external im- 
position condition were identical to those presented 
to the self-determination children, except that in 
place of the second paragraph was the statement, 
"You will be paid —— points for each right an- 
swer." In other words, the performance standards 
adopted by a child in the self-determination con- 
dition were applied to the subject paired with him 
in the external imposition condition. For example, 
ifa particular child in the self-determination con- 
dition chose to receive 7 points per correct answer 
in Session 1, 9 points per correct answer in Session 
2, and 10 points per correct answer in Sessions 3, 
4, and 5 of the reinforcement phase, the same pat- 
tern of performance standards was set for his 
matched counterpart in the external imposition 
condition. For children in the no-reinforcement 
condition, instructions differed from those given to 
self-determination subjects in that Paragraph 2 was 
omitted, as were the first, second, and fifth sen- 
tences of Paragraph 1; in addition, no-reinforce- 
ment subjects were told, “You will receive no 
Points or prizes for your work.” Children in the non- 
reinforcement condition received the same in- 
structions in both the reinforcement and extinc- 
tion phases. 

A number of procedures designed to minimize 
extraneous social influence were employed. Sub- 
Jects performed the task in the absence of the ex- 
perimenter. Children in the self-determination con- 
dition did not select a performance standard un- 
til after the experimenter had left the room. Since 
Incidental modeling cues may influence perform- 
ance in novel situations (as noted by Liebert, 
Spiegler, & Hall, 1970), the experimenter pointed 
to each of the 10 possible performance standards 
(rather than to only 1 or 2) that a child in the self- 
determination condition could select. Rewards 
consisted of candy and toys that ranged in value 
from two cents to about one dollar. In the rein- 
forcement phase, prizes were openly displayed 
while children in the self-determination and ex- 
ternal imposition conditions performed the task. 

© prizes were present when subjects in the no- 
Teinforeement condition performed. The experi- 
menters reported that, in general, the children 
Seemed to be very impressed by the more expen- 
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sive prizes. At the end of each reinforcement ses- 
sion, subjects in the self-determination and exter- 
nal imposition conditions were required to ex- 
change for rewards all the points they had earned. 
Prizes were placed in paper bags which were de- 
livered by the experimenter to the teacher who dis- 
pensed these rewards to the children at the end of 
the school day. 

Extinction phase. In the five extinction sessions, 
instructions to subjects in the self-determination 
and external imposition conditions were identical 
to those given to the no-reinforcement subjects. 
The children were again asked to solve arithmetic 
problems but were informed that they would be 
paid neither points nor prizes for their work. 


RESULTS 


Major dependent variables in this experi- 
ment were the number of correct problem 
solutions and the amount of time spent in 
the task setting; in addition, performance 
standards selected by self-determination 
subjects at the start of each reinforcement 
session were recorded. Reinforcement and 
extinction data were analyzed separately. 


Correct Solutions 


Figure 1 presents the mean number of 
problems solved correctly by the children. 

Reinforcement phase. A 3 X 5 analysis 
of variance disclosed both a significant main 
effect for sessions (F = 3.27, df = 4/84, p < 
02) and a significant Treatments x Ses- 
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SESSIONS 
Fiaure 1. Number of arithmetic problems solved 
correctly by subjects in the self-determination, ex- 
ternal imposition, and no-reinforeement conditions. 
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TABLE 1 


Newman-Keuts COMPARISONS or Mean NUMBER 
or Correct PROBLEM SOLUTIONS 


Reinforcement session 


1 2 3 4 | 5 


Self-determination vs. no reinforcement 


ns ns ns p< o1|p< o 


External imposition vs. no reinforcement 


ns ns p< .01 p< o|p< o 


Self-determination vs. external imposition 


ns ns ns ns | ns 


sions interaction (F — 2.58, df — 8/84, p « 
02) ; the main effect for treatments did not 
achieve statistical significance (F = 2.49, 
df = 2/21, p < .15). Newman-Keuls com- 
parisons were employed to further examine 
the significant interaction. Table 1 presents 
the results of these comparisons for each 
of the five sessions of the reinforcement 
phase. The superiority in performance of 
the self-determination condition relative to 
the no-reinforcement condition was appar- 
ent in Sessions 4 and 5, while the external 
imposition condition was superior to the 
no-reinforcement condition in Sessions 3, 
4, and 5, suggesting that, indeed, both of 
these experimental treatments were foster- 
ing greater productivity relative to the con- 
trol condition. In none of the five sessions 
did the self-determination versus the ex- 
ternal imposition comparisons reveal a sig- 
nificant difference. 

Extinction phase. Analysis of variance 
disclosed a significant main effect, for ses- 
sions (F — 8.78, df = 4/84, p < 001). Chil- 
dren generally performed less well as the 
extinction phase progressed. Neither the 
main effect for treatments (F = 2.20, df = 
2/21, ns) nor the Treatments x Sessions 
interaction (F = 1.87, df = 8/84, ns) was 
significant. However, as seen in Figure 1, 
there was a strong tendency on the part of 
external imposition subjects to outperform 
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those in both the self-determination and n 
reinforcement conditions. 
Time at Task 


Figure 2 presents the mean number of 
minutes spent in the task setting. 
Reinforcement phase. Both the main ef 


fect for treatments (F = 492, df = 2/21, 


action (F = 1.36, df = 8/84, ns) was not 
statistically significant. Individual compari- 
sons disclosed that both self-determination 
and external imposition subjects spent more 


time in the task setting than no-reinforce- _ 


ment subjects (t = 2.12, p < .05, one- 
tailed, and t = 2.82, p < .01, one-tailed, 


respectively), while self-determination and 
external imposition subjects did not differ 
in time at task (t = -50, ns, two-tailed). It 
is apparent that the effect of reinforcement 
was to maintain time spent in the task set- 


ting; in the absence of reinforcement, time 


at task decreased. 
Extinction phase. The main effect for ses- 


sions was significant (F = 9.15, df = 4/84, i 
P < .001). Children generally spent less | 
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time in the task setting as the extinction 
phase progressed. As was the case on the 
dependent measure of correct solutions, 
neither the main effect for treatments (F — 
1.51, df = 2/21, ns) nor the Treatments x 
Sessions interaction (F = .90, df = 8/84, 
ns) was significant. However, as seen in 
Figure 2, there was a tendency on the part 
of external imposition subjects to spend 
more time in the task setting than those in 
the self-determination and no-reinforce- 
ment conditions. 

Further statistical analysis revealed no 
sex differences in children’s responses to ex- 
perimental treatments. 


Self-Determined Performance Standards 


In the reinforcement phase, children in 
the self-determination condition could 
choose to receive between 1 and 10 points 
per correct solution. At the start of the first 
session, seven of the eight subjects selected 
the most lenient performance standard (10 
points per correct solution), while one child 
chose to receive 7 points per correct solu- 
tion. At the start of Sessions 2 through 5, 
every child selected the most lenient per- 
formance requirement. It is clear that chil- 
dren responded in a manner that would 
maximize their contingent reward. 


Discussion 


The overall results of the present investi- 
gation provide evidence for the efficacy of 
contingent reinforcement when academic 
standards are self-determined or externally 
imposed. In the reinforcement phase, con- 
tingent consequences maintained children’s 
academic behavior equally well under con- 
ditions of self-imposed and externally im- 
Posed standards; thus, the findings of Felix- 
brod and O’Leary (1973) were replicated. 
In the extinetion phase, analysis of variance 
disclosed no statistically significant differ- 
ences between the groups on the measures 
of correct solutions and time at task; how- 
ever, a strong tendency on the part of ex- 
ternal imposition subjects to outperform 
those in the self-determination condition 
Was noted. This latter finding is deserving of 
urther investigation. 
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Children in the self-determination condi- 
tion were observed to maximize their re- 
wards by selecting the most lenient per- 
formance standard in the reinforcement 
phase, a finding consistent with those of 
Felixbrod and O’Leary (1973) and Santo- 
grossi, O’Leary, Romanezyk, and Kaufman 
(1973). Apparently, many children, in the 
absence of social surveillance and in the 
presence of high magnitude rewards, will 
attempt to maximize their positive out- 
comes, especially if they can discriminate 
that no externally administered aversive 
consequences follow the self-imposition of 
lenient standards. 

In this time of great concern over the 
possible misuse of behavioral technology, 
the development of self-management skills 
is a step toward freedom from external con- 
trol. The results of this experiment and 
others (e.g., Bolstad & Johnson, 1972; Drab- 
man et al, 1973; Felixbrod & O'Leary, 
1973; Glynn, 1970) demonstrate the utility 
of self-management as an alternative to 
external control. However, in light of the 
present finding of self-imposed leniency in 
the absence of social surveillance, the task 
remains to determine how children could be 
taught to maintain reasonably stringent 
standards over time (i.e., do a fair amount 
of work for very little reward) without the 
need for continuous external monitoring. 
One method worthy of study involves the 
use of intermittent verbal feedback in which 
praise for adherence to moderately stringent 
self-imposed demands might be employed 
alone or used in combination with disap- 
proval of self-imposed leniency. In addition, 
research is needed to determine the variables 
in a child’s social learning history which 
may differentially affect responding during 
and following the administration of rein- 
forcement under conditions of self-deter- 
mined and externally imposed contingencies. 
Functional relationships between controlling 
stimuli and the behavioral components of 
self-control require further examination in 
a variety of situations. 
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The study investigated the effect of prior student teaching evalua- 
tions and lecture presentation on ratings of teaching performance. 
One half of the subjects were introduced to a professor using a written 
biographical description including a positive teaching evaluation, 
while with the other subjects a negative evaluation replaced the 
positive one. Following the lecture, subjects completed a teaching 
evaluation. The two lectures differed only in the material presented. 
Results showed that prior expectations of teaching performance in- 
fluenced ratings of the professor on a number of dimensions, with the 
positive evaluation group rating the professor significantly higher than 
the negative evaluation group. The effect persisted in both lectures 


and does not appear to be restricted to specific course material. 


Various groups within universities have 
adopted the practice of evaluating profes- 
sors’ teaching performance, in part to pro- 
vide feedback to the professors for possible 
self-improvement. Learning psychologists 
attest to the importance of feedback in the 
acquisition and change of various behavior 
patterns, Skinner (1957) has suggested that 
performance feedback acts as a discrimina- 
tive stimulus to an organism, informing it 
what to do and what not to do in order to 
attain a specified goal. If effective teaching 

' 1$ a goal, then feedback about performance 
can facilitate acquisition of the necessary 
behaviors. 

; It is less certain, however, whether publi- 
cing such information to the entire aca- 
demic community is essential for improved 
teaching performance. Ostensibly, profes- 
Sor/course evaluations are made available 
to the general university population to in- 

j form students which professors are the best 
teachers. The present study considered the 
— 

"Parts of this experiment were presented at the 
Canadian Psychological Association's Annual Con- 
pao in Victoria, British Columbia, June 1973. 
apart would like to thank W. J. McKeachie 

"his critical comments of an earlier draft. 

Requests for reprints should be sent to Ray- 
mond P. Perry, Department of Psychology, Uni- 


 Jersity of Manitoba, Winnipeg, Manitoba R3T 
N2 Canada. 


implications of such procedures by investi- 
gating the impact that preexisting, positive 
and negative evaluations have on ratings of 
teaching performance. It was hypothesized 
that the evaluations create certain expecta- 
tions in judges which bias subsequent rat- 
ings in the direction of the earlier evalua- 
tions. 

Empirical support for the hypothesis 
comes from several areas of psychology. For 
instance, Asch (1946) found that certain 
kinds of information can significantly alter 
ensuing personality judgments. Similar re- 
sults were reported by Kelley (1950), who 
introduced a lecturer using a written bio- 
graphical description in which the word 
warm was used to describe the lecturer’s 
personality for one half of the subjects, 
while for the remaining half, the word cold 
was used. Alternating certain kinds of in- 
formation caused drastic changes in person- 
ality evaluations, even though the lecture 
presentation was exactly the same for both 
groups. Rosenthal and Jacobson (1968) 
have shown that an individual’s expecta- 
tions regarding another person’s behavior 
may provide the basis for a self-fulfilling 
prophecy. Expectancy effects have been 
found even to influence experimental out- 
comes in scientific research where an exper- 
imenter can obtain support for his hypothe- 
sis by producing demand characteristics 
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which cause the subject to respond in the 
predicted direction (Adair, 1973; Orne, 
1962; Rosenthal, 1966). 

Taken together, the research suggests 
that prior expectations ‘significantly influ- 
ence later evaluation. In essence, a kind of 
reverse Rosenthal effect was predicted in 
that the students’ expectations were pre- 
dicted to influence their future evaluations 
of the professor. Those students receiving 
the negative teaching evaluation would rate 
the professor’s teaching performance more 
negatively than students who received a 
positive evaluation. The manipulations 
were carried out under two separate lecture 
conditions in order to assess the generaliza- 
bility of the effect. 


METHOD 


Subjects 


The subjects were 238 students enrolled in the 
introductory psychology courses at the University 
of Manitoba, Winnipeg, Canada. The 50 subjects 
who rated the 16 professor evaluations and the 103 
Lecture 2 subjects participated for experimental 
credits as part of the introductory course. Lecture 
Meabiecia participated under actual classroom con- 

ations. 


Materials 


The experiment consisted of two parts which 
differed only in the lecture material: Lecture 1 
involved a brief description of several concepts in 
Freud’s theory of personality, while the Lecture 2 
material focused on some basic concepts in de- 
velopmental psychology. The University of Mani- 
toba Student Union (UMSU) anticalendar was 
used to select the positive and negative teaching 
evaluations. The UMSU anticalendar listed each 
professor followed by the mean ratings of each 
question on the UMSU professor/course evaluation 
questionnaire and a brief written summary of the 
ratings and of students’ comments. The positive 
and negative evaluation summaries selected were 
as follows: 

Professor D(+). As a researcher he has investi- 
gated several variables related to perceptual dis- 
tortion. Primarily this research has examined vari- 
ous components of the visual system in humans. 
As a teacher, the UMSU teaching evaluations rated 
Professor D 


at or near the top of the academic department 
in all important teaching categories. Students 
praised the lectures given by Professor D. They 
were informative, lively, and stimulating. A 
relaxed atmosphere in the class encouraged dis- 
cussion. A friendly attitude between student and 
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professor enhanced the learning experience, Hj 
sections received one of the top course ratinj 
in the department. The subject matter was vel 
interesting to the students, and as one studen 
commented, “this course is one of the mo 
relevant I have taken.” The course material wa 
personally relevant to the students. Professor. 
was most highly recommended. 


Professor A(—). As a researcher he has investi 
gated several variables related to perceptual dise 
tortion. Primarily this research has examined vari 
ous components of the visual system in humans, 
As a teacher, the UMSU teaching evaluations rated 
Professor A 


below average in most categories when come 
pared to other instructors of the department: 
Professor A did not show much enthusiasm for 
teaching, did not greatly encourage class particle 
pation or outside consultation, and did not take” 
much of a personal interest in most of the stus 
dents. Lectures were boring and monotonous, 
and often consisted of a dictation of notes. Also, 
Professor A did not cover all the material in’ 
class, leaving it up to the students. In class, 
Professor A attempted to maintain “high school 
discipline.” 


The UMSU teacher rating form consisted of 
eight questions: five pertaining to the skill of the | 
instructor; two pertaining to the interaction of 
the instructor with students, and one question con- 
cerning an overall rating of the professor in com- 
parison to other university professors. Each ques- 
tion was followed by a five-category scale ranging 
from extremely unfavorable to extremely favorable, | 
These questions are similar to those found on the 
McKeachie-Lin Teacher Rating Form which pin- 
points six stable factors related to teaching evalut- 
tion (Isaacson, McKeachie, & Milholland, 1963). 


Procedure 


The teaching evaluations were taken from the 
1971-1972 UMSU anticalendar which is made avail 
able to students at the beginning of the academic | 
year. Eight positive and eight negative evaluations 
were selected from one discipline, psychology, 8D 
modified such that an alphabetical letter replace 
the professor's actual name. The 16 evaluations 
were combined into a booklet and each was fol- ; 
lowed by a 9-point rating scale, ranging from e3- 
tremely negative (—4) to extremely positive (+4). 
These booklets were administered to a grouP T 
50 judges who rated each evaluation on the sc** 
provided. The most extreme positive and extreme 
negative evaluations were selected for inclusion 1? 
the personal description of the lecturer. The sum- 
mary included biographical information such 88 0 
cupation and education, and this was followed by 
either the positive or the negative teaching evalua- 
tion (see above, Professors D and A). di- 

A professor, blind to the experimental con®! 


TEACHING EVALUATIONS AND RATINGS 


TABLE 1 
UNWEIGHTED MEANS ANALYSES OF VARIANCE 
FOR EVALUATIONS OF PROFESSOR 
SKILL, INTERACTION, AND 
OVERALL EXCELLENCE 


Evaluation* SS df MS F 
1. Skill of professor 

Lecture (A) 18.60 1 | 18.60} 4.62* 
Positive/nega- 

tive (B)+ 1.86 1 1.86 «1 
AXB .93 1 .93 «1 
Within cell 740.78 | 184 4.03 

2. Interaction of professor 

Lecture (A) 13.02 1 | 13.02 | 17.86** 
Positive/nega- 

tive (B)4- 5.12 1 5.12| 7.02** 
AXB .00 1 00} «1 
Within cell 134.12 | 184 NE 

3. Overall excellence of professor 

Lecture (A) 32,56 1 | 32.56 | 58.03** 
Positive/nega- 

tive (B)4- 14.42 1 | 14.42 | 25.70** 
AXB 1.86 1 1.86 | 3.32 
Within cell 103.27 | 184 .56 


Note. These are three separate analyses. 

_ * Positive/negative refers to the positive/nega- 
tive teaching evaluation contained in the profes- 
sor description. 

*p < .05. 

Po < 0l. 


tions, presented a lecture to a group of subjects 
which was divided into two parts: one half received 
the summary with the positive evaluation while 
the second half received the negative evaluation 
summary. In Lecture 1, the material was presented 
during a regular class of introductory psychology. 
The subjects were told only that they would be 
having a guest lecturer whose background was out- 
lined in the written summary. Lecture 1 focused 
on fewer concepts, was concerned with an inher- 
ently more interesting topic (psychoanalysis), was 
intended to be more humorous and relevant, and 
took place during a regular class meeting. An ex- 
press attempt was made to disguise the experiment, 
to allay any demand characteristics and, more im- 
portantly, to determine the generalizability of the 
phenomenon to actual classroom conditions. Sub- 
Sequent questioning during debriefing indicated 
that the students were unaware of the manipula- 
tions while the experiment was in progress. 

i In Lecture 2, the subjects were volunteers from 
he psychology department subject pool who were 


4 told that the researchers were interested in learn- 
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ing and teaching effectiveness. Lecture 2 was an 
overview of numerous developmental concepts, 
dealt with more theoretical issues, made little 
effort to engender humor or relévancy in the issues, 
and took place in a laboratory setting. The bio- 
graphical information and the professor were iden- 
tical in both lecture conditions. Following the lec- 
ture, the subjects were administered the teaching 
questionnaire. 


RESULTS 


Three 2 x 2 factorial analyses of vari- 
ance for unweighted means tested the ef- 
fects of positive/negative teaching evalua- 
tion and lecture presentation on each of the 
following dependent measures: skill of pro- 
fessor, interaction of professor, and overall 
excellence of professor in comparison to 
other university professors (Table 1). A 
student’s rating of skill of the professor was 
a mean of the five related questions; inter- 
action of the professor was a mean of the 
two related questions; and professor excel- 
lence was based on a single question (see 
Table 2 for cell means). While the positive/ 
negative evaluations did not affect the stu- 
dents’ ratings of the professor’s teaching 
skills (F < 1, df = 1/184, p > .05), they 
did significantly influence ratings of inter- 
action capacity (F = 7.02, df = 1/184, p < 
01) and ratings of overall excellence (F = 
25.70, df = 1/184, p < .01). Students re- 
ceiving the positive evaluation, as com- 
pared to students receiving the negative 
evaluation, rated the professor as being more 
oriented to student interaction (2.47 versus 
2.14, respectively) and as being a better 
professor overall (3.97 versus 3.41, respec- 
tively) on a scale ranging from 1 to 5. It is 
worth noting that the positive/negative ef- 


TABLE 2 
Mean RATINGS FOR PROFESSOR SKILL, 
INTERACTION, AND EXCELLENCE AS A 
FUNCTION OF Positive/NEGATIVE 
TracHiNG EVALUATIONS AND 
LECTURE PRESENTATION 


Positive evaluation | Negative evaluation 


Professor 
characteristic 
Lecture 1 | Lecture 2 | Lecture 1 | Lecture 2 
Skill 4.48 3.93 4.36 3.64 
Interaction 2.76 | 2.17 2.37 1.90 
Excellence 4.48 | 3.40 | 3.74 3.08 
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fect held constant in both lecture conditions 
for all three dependent variables, Overall 
skill was rated as .41, interaction as .66, 
and excellence as 1.12 higher in the positive 
as compared to the negative evaluation 
condition. 

Lecture presentation had a significant ef- 
fect on the students' ratings of the profes- 
sor's teaching skill (F — 4.62, df — 1/184, 
p « .05), his interaction capacity (F — 
17.86, df = 1/184, p < .01), and his overall 
excellence (F = 58.03, df = 1/184, p < 
.01). Students in Lecture 1, as compared to 
students in Lecture 2, rated the professor as 
more skillful (4.42 versus 3.79, respec- 
tively), as interacting more with the stu- 
dents (2.57 versus 2.04, respectively), and 
as being a better professor overall (4.11 
versus 3.27, respectively). Lecture 1 pro- 
duced more favorable ratings than Lecture 
2 for skill of the professor (1.27 higher), in- 
teraction capacity (1.06 higher), and over- 
all excellence (1.68 higher), 


Discussion 
The experiment investigated the effects of 
preexisting — positive/negative ^ professor 


teaching evaluations and lecture presenta- 
tion on ratings of teaching performance. An 
attempt was made to facilitate the general- 
inability to the results to the university en- 
vironment by conducting part of the experi- 
ment under actual classroom conditions 
with regular students as subjects (Lecture 
1). Furthermore, the stimulus materials 
used were evaluations that previously had 
been summarized and distributed to stu- 
dents by a university-wide, professor/ 
course evaluation committee. It is likely 
that, had the experimenters constructed the 
evaluations, the stimulus materials may 
have had an even greater impact on the 
professor ratings. Lastly, the UMSU 
Teacher Rating Form was selected since it 
had been used previously to assess teaching 
ability at the University of Manitoba and 
since it is probably representative of the 
teacher rating forms that are presently 
being used in faeulties and universities 
throughout the country. 

Research on feedback effects in the col- 
lege teaching environment has been mini- 
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mal and existing data are somewhat equiv- 
ocal (Pambookian, 1974). Typically, thes 
studies have investigated the effect of 
teaching-evaluation feedback on future 
teaching behavior. However, the present exe 
periment focused on a crucial methodolog 
cal problem related to evaluation feedback; 
the impact on students’ subsequent ratings 
of teaching performance. The results, eom: 
sistent with findings in other areas of psy- 
chology, showed that existing expectation 
significantly influenced the ratings of the 
professor’s teaching performance. Subjects 
who initially received the negative teaching 
evaluation rated the professor’s lecture 
presentation lower in terms of the profes 
sor's capacity for interaction with students 
and of his overall excellence as a teacher in 
comparison to other university instructors, 
Skill of professor was not influenced by ex- 
pectancy effects; however, the means were 
in the predieted direction. Of these three 
dimensions, the most important was consid- 
ered to be overall excellence, and it was this 
dimension which showed the largest differ- 
ences between the positive and negative 
groups. 

These differences persisted in both lecture” 
conditions, and there was an overall lecture 
effect. Although the positive/negative feed- 
back effects existed in both lecture condi- 
tions, Lecture 2 showed a pronounced over- 
all decline. The ratings of professor skill, 
interaction, and excellence exhibited a con- 
sistent drop for both positive (.55, .59, 1.02, 
respectively) and negative (.72, .47, .66, 
respectively) evaluation groups. Note that 
Table 2 indicates lecture material may in- 
teract with expectancy effects in that 8 
"poor" teacher presenting interesting mate- 
rial is rated consistently higher in skill, n 
teraction, and excellence than a “good 
teacher presenting boring material. These $ 
data suggest that lecture content can be à 
major variable influencing performance rat- 
ings, and they give some credence to the ar- 
gument of some professors who claim that 
by teaching certain courses like learning 
statisties, and perception, their ratings su! 
fer. Unfortunately, a confound exists in the 
experimental design in that the two lecturé 
conditions differed in both content materia 
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TEACHING EVALUATIONS AND RATINGS 


and testing situation, and consequently, it 
is difficult to determine precisely how these 
factors contributed to the final outcome. 

One major criticism surrounding such 
techniques as the UMSU Teacher Rating 
Form concerns their validity for measuring 
“good” and "bad" teaching performance. 
Recently, both Leventhal (1973) and 
Knapper, McFarlance, and Scanlon (1972) 
have argued against earlier contentions that 
some teacher rating forms are valid (e.g., 
McKeachie, Lin, & Mann, 1971). In either 
case, the results of the present experiment 
suggest that the general distribution of pro- 
fessor ratings can have adverse conse- 
quences. If teacher rating forms are invalid, 
then their potential value to the professor 
as feedback to facilitate goal attainment 
must be questioned, especially in the case of 
negative evaluations. From a learning the- 
ory viewpoint, improper discriminative 
stimuli are being used as well as punish- 
ment for making the incorrect response. 
This is the least conducive and most ineffi- 
cient method for achieving the desired re- 
sponse, effective teaching. 

Assuming for the moment that teacher 
rating forms are both valid and reliable, 
consider the instructor who gets negative 
ratings similar to Professor A’s in this 
study. Such ratings might have a direct im- 
paet on his teaching effectiveness in a vari- 
ety of ways. First, the students who sign up 
for his course may be of poorer quality 
than the normal population, Those students 
who are academically oriented and who are 
concerned about learning and teaching may 
actively avoid those sections having nega- 
tive evaluations. Second, the motivation 
level of the students may be similarly af- 
fected, with those having higher levels seek- 
ing those sections having positive evalua- 
tions. Third, we should expect the attitudes 
and expectations of those students enrolled 
in courses receiving negative evaluations to 
be changed in the direction of the ratings. 

These developments, taken together, can 
reduce the teaching effectiveness of the pro- 
fessor since his course has selectively sam- 
pled an atypical group from the general 
Population. In studying those factors influ- 
encing course selection, Leventhal, Breen, 


855 


and Perry (1974) found that as many as 
35% of 2,500 introductory psychology stu- 
dents indicated that a professor's teaching 
reputation was the primary determinant in 
their course selections. Obviously, the pub- 
licized professor/course evaluations exert a 
major influence on the professor’s overall 
teaching reputation. The resultant effect is 
likely to produce a vicious circle in which 
negative evaluations cause poorer quality 
students to enroll in the course, creating 
adverse learning conditions, and eventually, 
negative reevaluations. 

An argument can be made that this ex- 
periment was a single exposure study and, 
hence, is not truly representative of teach- 
ing conditions which extend over lengthy 
periods of time. Furthermore, a professor 
should be capable of improving his ratings, 
in time, by changing his teaching style. 
This seems a reasonable argument to make 
regarding expectancy effects, since the so- 
cial psychological literature clearly shows 
that attitudes are modifiable. However, 
class selection procedures like those out- 
lined above lead to nonrandom sampling 
biases in which the poorly rated professor 
ends up with poorer quality students. This 
creates adverse learning conditions which 
increase the probability that the professor 
will receive negative ratings again, Conse- 
quently, the publication of teaching evalua- 
tions may create a set of conditions which 
are extremely resistant to change, even over 
an extended period of time. 

Tt is apparent that a professor evaluation 
system is needed in which the advantages 
of effective teaching are maintained, while 
the disadvantages of evaluation as demon- 
strated in this study are minimized. Within 
such a system, only the most positive eval- 
uations would be publicized in the antica- 
lendar, while average and poor ratings 
would be communicated to the respective 
professors for their own information. This 
would serve three functions. First, accord- 
ing to learning theory, it would reward ef- 
fective teaching directly and increase its 
value through social recognition. Second, it 
would reduce the punishing effects of nega- 
tive evaluations at least at the social level, 
with the result that the poorly rated profes- 
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sor would not be punished for not making 
the desired response. Third, for those pro- 
fessors who received poor ratings, this sys- 
tem would minimize biased selection proce- 
dures so that they do not receive an unre- 
presentative sample of poor quality stu- 
dents. In addition, an opportunity should 
be made available for professors to contest, 
evaluations before they are made public. 
The professors should have access to the 
original data and to the machinery that 
was used to generate the written descrip- 
tions of the professor's teaching perform- 
ance, 
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This review is a secondary analysis of work done by the International 
Association for the Evaluation of Educational Achievement. It con- 
cerns mathematics, reading comprehension, and science. The objective 
was to look for a relationship between age of entry into school and 
achievement as measured at ages 10 and 13. Principal findings suggest 
a specificity of effect between early entrance into school and greater 
achievement in mathematics but not in reading comprehension or 
science. The authors suggest that gain in achievement based on early 
education may be due to the sensory motor experiences which are so 
commonly part of early childhood education programs. It is further 
suggested that the carefully planned and sequenced curriculum ap- 
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proach to teaching mathematics may have an effect here. 


Interest in early childhood education has 


grown rapidly in Europe in the last few 


TN 


years, reflecting an increasing demand for 
equality of educational opportunities, the 
changing role of the family in society, and 
widening opportunities outside the home for 
women. Concurrent with the redefinition of 
family roles is a significant shift from edu- 
cation of young children as the primary re- 
sponsibility of parents to a shared respon- 
sibility with institutions of education. 

In the last 13 years, much has been writ- 
ten about the importance of early childhood 
education as a basis for later school learn- 
ing. Hunt (1961) and Bloom (1964) are im- 
portant contributors to this literature. An 
outgrowth of these authors’ writings was the 
conception and development of such proj- 
ects in the United States as “Head Start” 
and “Follow Through.” The American ef- 
forts have been duplicated in a number of 
countries including England, with the Edu- 
cational Priority Act; the Netherlands, with 
early compensatory education programs; 

* Requests for reprints should be sent to Gilbert 
a Austin, Bureau of Educational Research and 

leld Services, College of Education, University of 
Maryland, College Park, Maryland 20742. 


and the Federal Republie of Germany, with 
a wide variety of cognitively oriented pre- 
school programs. Many of these projects, 
in both North America and Europe, have 
made the assumption that early entry into 
school would have a beneficial effect on the 
children's later school achievement. In a 
number of national studies in Canada, the 
United States, Sweden, and the Netherlands, 
the benefits of these early school interven- - 
tion programs, particularly any long-term 
effects, have been very difficult to document, 
(Bissel, 1973; Davie, Butler, & Goldstein, 
1972; Halsey, 1972; Kohnstamm, 1968; 
Plowden Report, 1967; Ryan, 1971; Stukat, 
1971). 


METHOD 


The studies identifed above have primarily 
been national efforts whose primary or secondary 
purpose is to address the question of effective age 
of entry into school. Since very little variation in 
effective age of entry into school occurs within a 
country, these studies are of limited value. An 
alternate method of looking at this question is to 
examine the results of multinational studies whose 
primary goal is not to answer questions about age 
of entry into school but whose data banks can be 
used in secondary analyses to give some insights 
about the problem. 
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It is becoming more common to undertake sec- 
ondary analyses since in large data collections 
the temporal and financial constraints on such 
projects usually mean that only a fraction of the 
possible analyses have been undertaken. This 
article has taken data from three International 
Educational Achievement (IEA) studies—mathe- 
matics (Husen et al, 1967; Postlethwaite, 1967), 
science (Comber & Keeves, 1973-1974) and reading 
comprehension (Thorndike, 1973-1974)—reana- 
lyzed them in relationship to both the official age 
of entry and effective age of entry from Organiza- 
tion for Economie Cooperation and Development 
studies (Austin, 1978a, 1973b). 

To discover whether the age at which students 
enter school is causally related to some specified 
later achievement, it would be necessary to assign 
students at random to various groups and have 
each group start at a different age but be subjected 
to identical schooling. While theoretically sound, 
such an experimental research would be impossible 
to conduct in practice within any one educational 
system. 

The mean scores can be found in the original 
publications, Each country gave its official age of 
entry. Austin recategorized countries by effective 
age of entry, the year in which 75% or more of an 
age group enter formal school. The IEA studies 
are cross-sectional surveys with small standard 
errors of sampling. 

Scattergrams were produced showing the re- 
lationship between both official age of entry, ef- 
fective age of entry and mean scores, 


TABLE 1 
OFFICIAL AND EFFECTIVE AGE or Entry, MEAN 
Scores, AND STANDARD DEVIATIONS 
IN Matuematics (AGE 13) 


T——————————— 


* P [Population 1A|Population 1B 
Country H 
3 3 M | sD| mu | sp 
Australia 6 | 5 | 20.2) 14.0] 18.9] 12.3 
Belgium 6 | 3 | 27.7} 15.0] 30.4] 13.7 
England 5 | 5 | 19.3] 17.0] 23.8] 18.5 
Finland 7 | 7 | 15.4! 10.8) 16.1] 11.6 
France 6 | 4 | 18.3) 12.4] 21.0] 13.2 
Germany 
(Federal 
Republic) 6 6 25.5] 11.7 
Israel 6 |4 32.2) 14.7 
Japan 6 6 | 31.2! 16.9] 31.2] 16.9 
Netherlands 6 | 4 | 23.9] 15.9] 21.4| 12.1 
Scotland 5 | 5 | 19.1) 14.0| 22.3] 15.7 
Sweden 7 | 7 | 15.7] 10.8) 15.3] 10.8 
United States | 6 | 5 | 16.9) 13.3 17.8| 13.3 


Note. Abbreviations: OAE = official 
entry, EAE = effective age of entry. Ne 
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TABLE 2 
OFFICIAL AND EFFECTIVE ÅGE or ENTRY, 
Scores, STANDARD DEVIATIONS IN READIN 
COMPREHENSION AND ScrENCE Acer 10) 


E 3 peek con- 
rehension 
Country 5. A 
ore m 
Belgium 
(Flemish) 6 3 |17.5| 9.2 
Belgium 
(French) 6 3 |17.9| 9.3 
England 5 | 5 | 18.5/11.6 
Finland 7 7 | 19.4/10.8 
Germany 
(Federal 
Republic) 6 6 | — | — |1.9 
Hungary 6 5 |14.0| 9.8 | 16.7 
Israel 6 4 |13.911.0| — 
Italy 6 | 5 |21.6| 9.6 | 17.5 
Japan 6 6|—|—]|217 
Netherlands 6 4 |17.7| 9.5 | 15.3 
Scotland 5 5 18.4/11.1 | 14.0) 
Sweden 7 7 | 21.5)10.5 | 18.3) 
United States | 6 5 | 16.8/11.6 | 17.7 


Note. Abbreviations: OAE = official age 
entry, EAE = effective age of entry. 


REsuLTS 


Tables 1 through 3 and Figures 1 through 
4 present the data for national mean scores 
for 13-year-olds for mathematics (Popula- 
tions 1a-1b), 10-year-olds and 14-year-olds 
for reading comprehension, and 10-year-olds 
for science and their relationships to official/ 
effective ages of entry? ; 

It will be seen that earlier age of entry 18 
associated with higher scores in mathematics 
at age 13 (Figures 1 & 2) and with lower 
Scores in reading and science at age 10 
(Figures3 &4). 


Discussion 


Other data collected by IEA (Comber & 
Keeves, 1973-1974) indicate that science i$ 
spasmodically taught up to the age of 10 in 
most schools. Thus, it may not be surpris- 
ing that the longer children remain at home 


*Detailed definitions of populations 1A at 
may be found in the original /nternational St 4 
of Achievement in Mathematics (Husen, 1967) do 
ument, 
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TABLE 3 
OFFICIAL AND EFFECTIVE Ace or Entry, MEAN 
Scores, STANDARD DEVIATIONS IN READING 
CoMPREHENSION (AGE 14) 


Belgium (Flemish) 

Belgium (French) 

England 

Finland 

Germany (Federal 
Republic) 

Hungary 

Israel 

Italy 

Japan 

Netherlands 

Scotland 

Sweden 

United States 


ANARASARR NASO | OAE (years) 
AWA ATR -O o co | EAE (years) 


Nolte. Abbreviations: OAE = official age of 
entry, EAE = effective age of entry. 


the higher their achievement on the IEA 
science tests. 

In the IEA cross-sectional surveys, there 
was variability between countries and par- 
ticularly within countries on the age of entry 


MEAN SCORE 


0 5 6 7 
Population la 


r= -0.23 - Age 13 


Fiaure 1. Scattergram showing relationship bet 
for mathematics. (Abbreviations: J = Japan, 
US = United States, Eng = 


Australia, F = France, 
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to school. The number of observations (in 
this case, national mean scores) are limited, 
typically to 12 or 13. Clearly the number 
of factors associated with between-country 
differences are manifold. Therefore, to ex- 
amine only the relationship of national age 
of entry to later achievement is a highly 
speculative venture. Nevertheless, the con- 
tradictory results between reading and 
mathematics are so unexpected that they 
are worth presenting. 

These differences in mathematics and 
reading comprehension achievement are not 
simply explained. Many factors influence 
the mean level of performance and the var- 
iation from the mean. One possible source 
of variation is, as we have indicated, age of 
entry, official or effective. It might be that 
mathematics instruction reflects the effects 
of early school intervention more than does 
reading comprehension. Such an assertion 
is based on the development sequence of 
children proposed by Piaget and Montes- 
sori, both of whom indicate that sensorimo- 
tor learning preceded symbolic or abstract 
learning. It is possible that the sensorimo- 
tor-play activities of preschool transfer to 
beginning mathematics skills more easily 


MATHEMATICS 
» 


0 5 ra 6 7 
Population 
r= -0.38 - Age 13 


OFFICIAL AGE OF ENTRY 


ween official age of entry and mean scores 
B = Belgium, N = Netherlands, A = 
England, Se = Scotland, Fl = Finland, 


Sw = Sweden, G (FR) = Germany (Federal Republic), and Is = Israel.) 
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MEAN SCORE 


Population la 
r= -0.37 - Age 13 


EFFECTIVE AGE OF ENTRY 


Ficure 2. Scattergram showing relationshi 


than they do to the acquisition of reading 
comprehension skills. 

In examining the mathematics and read- 
ing results, one might Suggest a specificity 
of effect between mathematics achievement 
and/or comprehension achievement and the 
qualitative planning which go into the pres- 
entation of the material to the students. 
Mathematics is generally judged to be a 
more easily sequenced set of learning ex- 
periences than is reading. Additionally, 
mathematics is more commonly thought to 
reflect the effects of in-school instruction 
than does reading comprehension. 

The effects of qualitative planning of 
early childhood education programs have, 
in recent years, been studied by many peo- 
ple. These authors (see Stanley, 1972, Pre- 
School Programs for the Disadvantaged, 
and Little and Smith, 1971, Strategies of 
Compensation) and many others (see Chall, 
1967, Learning to Read: The Great Debate) 
conclude that to bring about qualitative 
changes in early childhood education, it is 
necessary to plan carefully for the attain- 
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MATHEMATICS 


ip between effective age of entry and mean 
scores for mathematics. (Abbreviations: J = Japan, B = Belgium, N = Netherlands, A= 
Australia, F = France, US = United States, Eng = England, Sc = Scotland, Fl = Finland, 
Sw = Sweden, G (FR) = Germany (Federal Republic), and Is = Israel.) 


Population 1b 
r= -0.49 - Age 13 


ment of stated goals and objectives and to 
evaluate the success of their attainment, 
(These three books are based primarily 
upon research in the United States). 3 
Another aspect of qualitative planning 
concerns the “opportunity to learn” (or the | 
actual, as opposed to intended, curriculum). 
In the IEA mathematics study, the rank 
correlation between teachers’ ratings of op- 
portunity to learn and the mean score m 
the countries for Population 1A is .96 an 
for Population 1B it is .98. This high con 
lation is what one would expect, and 1 
seems to strengthen the argument that de- 
gree of emphasis (opportunity to learn) in 
school does make a real difference in mete | 
matics achievement. It is unfortunate thal 
on the reading comprehension study, the 
IEA authors did not collect any information 
about opportunity to learn. ch chile 
The quality of the homes from which ¢ d 
dren come might also have an influence, a 
though it may vary somewhat from coun T 
to country. The curriculum (the extent. 
which the educational objectives embod! 
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READING COMPREHENSION 


MEAN SCORE 
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SCIENCE 


0 5 6 7 
4 = 0.42 - Age 10 


OFFICIAL AGE OF ENTRY 


Ficure 3. Scattergram showing relationship between official age of entry and mean scores 
for reading and science. (Abbreviations: IT — Italy, US — United States, SC — Scotland, 
B (FR) — Belgium (French), FL — Finland, ENG = England, H = Hungary, B (FL) = 


Belgium (Flemish), N — Netherlands, IS 
(FR) = Germany (Federal Republic).) 


f in the materials used in school stretch the 
children) as well as the amount of home- 
E will be important (cf. Postlethwaite, 

One of the major findings of the total 
IEA study (Postlethwaite, 1973) is that, on 
average, variations between learning condi- 
tions were associated with variations in 
achievement in science and foreign lan- 

, guages (and more so in higher grades), 

[ whereas for reading comprehension “learn- 
ing conditions" effects were small. Further- 
more, it was discovered that in many coun- 
tries reading comprehension per se was not 
systematically taught. Once children had 
conquered the mechanics of reading, they 
were left to their own devices concerning 
further learning in the understanding of 

j texts, Hence, it is not surprising that the 
reading comprehension rebounds onto the 
home and is then much more associated with 
Variations between homes than between 
schools. This may well account for the slope 
of 10-year-old scores in Figure 4. 

The above interpretation is speculative, 
even though other factors have been con- 
trolled. The interpretation of the “effect” of 

_ age of entry is even more speculative, given 


4 


= Israel, SW = Sweden, J = Japan, and G 


no control of other factors. The problems 
of how to control for other factors in be- 
tween-country comparisons have yet to be 
tackled. 

This review suggests that there may be 
some interaction between age of entry, sen- 
sorimotor activities, qualitative planning, 
and the effects of home background on the 
attainment of instructional objectives in 
mathematics. The same variables should 
be investigated for reading comprehension 
achievement. 

This secondary analysis exemplifies many 
of the difficulties inherent in such under- 
takings. To attack the specific problem of 
the effect of age of entry, a special experi- 
mental design study should have been con- 
ducted. Hence, the IEA design’ is inappro- 
priate. Second, the lack of degrees of free- 
dom in the number of observations is a prob- 
lem. These, and similar problems, will al- 
ways face researchers using data banks 
created for purposes different than their 
own. Nevertheless, in view of time and fi- 
nancial constraints properly executed, sec- 
ondary analyses of primary data are a valid 
and valuable method of speculative analy- 
sis such as that presented in this study. 
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READING COMPREHENSION 


EFFECTIVE AGE OF ENTRY 


Figure 4. Scattergram showing relationshi 
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LESSON KINETIC STRUCTURE ANALYSIS AS RELATED 
PUPIL AWARENESS AND ACHIEVEMENT 


ROBERT J. BROWNE! O. ROGER ANDERSON 


Hunterdon County Department of Education, Teachers College, Columbia University 
Flemington, New Jersey 


This study investigated the effects of variations in communication 
structure on knowledge acquisition and subject’s awareness of structure 
variation. Three transcripts differing in content structure as determined 
by kinetic structure analysis were developed. The amount of kinetic 
structure in a communication was the number of words held in com- 
mon in pairs of contiguous statements. The 191 subjects were ninth 
graders. Three experimental groups were administered one of the three 
structure treatments. Data showed that student achievement is di- 
rectly related to communication structure (p < 05). Subjects were 
not able to detect variations in lesson structure. A method of sensitivity 
training was devised which produced discrimination ability (p « .05). 
The theory-based, quantitative method of communication structure 
used herein yields significant effects on learning outcomes. Pupil as- 
sessment of teacher competencies in this domain may not be valid 
without preliminary sensitivity training. 


Verbal communication accounts for a effects were observed most frequently when 
large proportion of classroom learning ac- the verbal material contained interrelated 
tivity. Clearly the kind of content com- Sentences as opposed to free-standing sen- i 
municated and particularly the way it is tences. A limitation of these studies, how- 3 
organized can influence student knowledge ever, was that they did not provide a theo- 
Acquisition and possibly affect student moti- retical basis for variations in sequential 
vation to maintain sustained interest in the organization nor did they use quantitative 
learning activity. The authors have per- methods to assess variations in structure. 
formed a study of teacher verbal communi- Tn this study, we used a content analysis 
cation to determine the effects of variation system devised by Anderson (1970, 1971) to 
in communication Sequential organization generate various amounts of organization in 
on content acquisition and to examine the curriculum content while holding nearly 
students’ ability to assess variations in com- constant the total amount of content pre- 
munication organization, This latter point sented. This system provides a quantitative 
is becoming of greater importance in the method to assess the amount of organization 
light of the movement toward student par- ina verbal communication and was used to 
ticipation in teacher accountability models. verify that criterion levels of structure in 
This study investigated the basis for un- the stimulus content matter had been re- 
trained observers, namely, high school stu- alized. We therefore gained careful control 
dents, participating in the evaluation of over variations in stimulus organization, 
classroom teachers, — which was the independent variable in our 

Previous research (Kissler & Lloyd, 1973; study. A summary of the theory underlying 
Natkin & Moore, 1972) on the effects of this method is presented in background for 
sequential organization on learning mean- 4 more specific statement of the research 
ingful material has shown that criterion problem 5 oo 


* Requests for reprints should be sent to Robert 
Ji Browne, Hunterdon County Department of oot 
Education, County Administration Building, Flem- A basic measure of the amount of strut- 
ington, New Jersey 08822. ture in a communication is the extent to 
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which individual utterances are connected 
through linking ideas. If we consider a 
verbal communication to be composed of a 
series of statements approximately equiva- 
Jent to a sentence in written discourse, then 
the amount of structure in the communica- 
tion is defined in part as the amount of re- 
peated substantive terms in pairs of contig- 
uous statements in the communication. One 
or more repeated substantive terms (words) 
in contiguous statements produces linkage 
or continuity of thought in that part of the 
communication sequence. 

For purpose of clarity and precision we 
call a single verbal utterance a discourse 


. unit, and each of the substantive words 


(nouns, adjectives, and verbs representing 
objects, classes of things, or actions) within 
the discourse unit is called a verbal element. 
To assess the amount of structure in a com- 
munication, we examine each successive pair 
of discourse units in the communication and 
determine how many of the verbal elements 
in each pair are matched. From this, we ob- 
tain a coefficient of content organization 
called the fundamental coefficient. It is a 
measure of the amount of matching of verbal 
elements in consecutive pairs of discourse 
units. The concept of matching, or of link- 
ing of ideas as it were, between consecutive 
discourse units is called commonality. In 
Figure 1, the concept of commonality is 
illustrated for three consecutive verbal 
statements represented as rectangles con- 
taining words represented by alphabetic 
letters. The arrows between consecutive rec- 
tangles show the commonality in each pair. 
The fundamental coefficient (B1) is a mea- 
sure of commonality. 


m +m’ 


where m is the total number of matched 
verbal elements in a discourse unit pair, and 
no is the number of unmatched verbal ele- 
ments. A B, coefficient is computed for each 
discourse-unit pair. For example, in Figure 
1, the first pair of discourse units contains à 
total of four matched verbal elements. There 
are two unmatched elements. Therefore, P1 
= 4/2 + 4 = 67. The second successive 
pair has two matched elements and three 
unmatched elements; thus Bı = 2/8 + 2 


Bi 
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ang) [pss] be | 


aS 


Fiaure 1. A conceptual schema of commonality. 


40. When analyzing a transcript each verbal 
element is assigned a code number and is 
represented by that number each time it 
appears in successive discourse units. 

In addition to Bi, a second coefficient, 
Bs, was produced. The coefficient B2 in- 
cludes a potency factor, & ratio whose 
magnitude depends on the frequency of oc- 
currence in the entire lesson of the un- 
matched elements in à discourse-unit pair. 
The Bə yields a graphical analysis of the 
transcripts called kinetograms, as explained 
in a previous publication (Anderson, 1971). 

In addition to commonality as a variable 
in our research, we define a theme as being 
a linking idea that occurs repetitively 
throughout à section of the communication. 
The amount of theme activity is the fre- 
queney of occurrence of the linking idea 
divided by the total number of discourse 
units in that section containing the theme. 


Purposes 


The purposes of this study were to (a) 
systematically vary the amount of com- 
monality in & verbal communication; (b) 
determine the effects of this variation on 
acquisition of the communicated knowl- 
edge; and (c) determine whether students 
were aware of the variations in content 
organization. There is currently much in- 
terest in teacher accountability and in stu- 
dent evaluation of teacher performance. A 
clear need exists to obtain evidence about 
the kinds of teacher behaviors that actually 
influence student learning and to determine 
under what conditions, if at all, students can 
assess variations in relevant teacher per- 


formance. 


Hypotheses 

The conceptual hypotheses of this study 
were the following: (a) Student content 
achievement is directly related to lesson 
content structure; and (b) awareness of 
commonality and theme development is 
directly related to lesson content structure. 
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TABLE 1 
CHARACTERISTICS OF THE TAPED 
TRANSCRIPTS 
—————— 
Characteristic | Transcript | Transcript | Transcript 
B; 54 ES .22 
Discourse units 88 88 89 
Coded verbal ele- 
ments 74 74 75 
Verbal element fre- 
quency 427 428 355 
Elapsed time of 
tape (min.) 9.75 8.83 7.07 
Discourse units per 
minute 10 ll 13 
eS ee ee 
MxrnRop 


Preparation of Treatment Communications 


The characteristics of the three transcripts used 
to prepare tape-recorded communications are re- 
ported in Table 1. Discourse units per minute were 
computed by averaging the number of discourse 
units spoken in the corresponding minutes of the 
three transcripts. Sample Minutes 1, 2, and 5 
were chosen randomly with the aid of a table of 
random numbers. While the recorded transcript 
was played, the discourse units in the first, second, 
and fifth minutes were counted and totaled. The 
total was divided by three to obtain the average 
number of discourse units spoken per minute. This 
Process was repeated for each of the three tran- 
Scripts, 

The high-kinetic-structure transcript, mean B, 
of 54 was produced by making minor revisions to 
the transcript used by Trindade-Khristanand 


(1971). 

The intermediate-kinetic-structure transcript 
used in this study was generated from the high- 
kinetic-strueture transcript by reorganizing the 

course units, An 
content logic—to have the script make sense when 
read by preserving the logical meaning of connect- 
ing words such as 


Repetitious discourse units and statements 


organizing the intermediate transcript. It had a 
mean B, of 22. Samples of the coded transcripts 
of the three tran- 
Scripts used in this study are reported in Table 2. 
The difference in the amount of commonality, as 
measured by mean B,, is shown in Table 1. 

Each of the transcripts was analyzed by plotting 
kinetograms to determine how consistent the level 
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of structure was throughout, the lessons, B; 
this data we found each of the lessons of 
structure throughout. 

To aid the students with the conceptualig 
of the scientific terms used in the transcripts, 
diagram was provided. It consisted of a pen ai 
ink drawing of the major structures of Rhizopy 
Through conferences held with the elementar 
school science teachers it was ascertained that tl 
subjects would be naive with reference to tl 
content of the treatment sessions, namely, thi 
structure and life cycle of the bread mold, Rhizopu 
With this knowledge it was decided not to use 
pretest and thus to reduce subject fatigue. 


Preparation and Scoring of Evaluation 
Instruments 


Knowledge acquisition was measured by 

30-item, four-part, multiple-choice content achieve 
ment test. The test questions were designed t 
assess the subjects’ knowledge of the structure 
life cycle of Rhizopus. Trindade-Khristanand us 
the same test in his study and reported a test 
reliability of 89 (Trindade-Khristanand, 1971). 
A questionnaire was prepared to assess pupil 
awareness of commonality. Statements selected for 
the final version had a Likert- pe scale of one 
through seven. Statement A read: “In this lesson” 
the same idea is repeated when moving from one 
Statement to the next”; and Statement B read: 
“Each idea presented is closely related to the 
preceding idea.” : 

The scales of the awareness questionnaire were 
scored according to where the subject recorded an 

- Intervals along the scale were designated by | 
indices every half point from 1.0 through 7.0. A 
mark anywhere in the interval between 3.0 and 
3.5, for example, was scored as 3.0. 


Treatment Groups 


Subjects were randomly nssigned to treatment 
groups. The mean IQ of the 191 students who 
participated in the study was 105.82. An analysis 
of variance showed no significant, difference among 
the means of the three treatment groups, Wl 
respect to IQ. Treatment group sessions were held 
in the high school. The treatment session followed 
a stan Protocol, namely, reading to the stu- 
dents the directions for the study and then playing 
the audiotape with the Rhizopus diagram in fi 
view of the Subjects. Next, the content achieve- 
ment test was administered and then the aware- 
ness questionnaire. 


Structure Awareness Training Methodology 


As reported in the Results section, the findings 
from the full-scale study on the questionnaire 
showed that the subjects failed to discrimine 
among the levels of Structure. A collateral stu y 
Was carried out to investigate the conditions nec 
essary for subjects to make this discrimination, 
Several classes of biology students from the sam 
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TABLE 2 
SAMPLES OF CODED TRANSCRIPTS 
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Discourse unit 


Script 


High-kinetic-structure transcript 


ar o N 


uo 


Today we are going to study the history or life cycle of the 
bread mold, Rhizopus. 

n: mold, whose history we're going to investigate, is a 
plant. 

We come to know a plant by carefully noting down its ex- 
ternal appearance and characteristics, as here in the case 
of bread mold. 

We will study how a plant grows, particularly how bread 
mold matures and reproduces itself. 

There are two ways in which a plant reproduces itself: (a) 
sexually and (b) asexually, 

The plant whose reproduction, both sexual and asexual, we 
etn. to consider is a fungus called Rhizopus or bread 
moid. 

We will study its external appearance and its reproduction, 
sexual and asexual. 


Intermediate-kinetic structure transcript 


Today we are going to study the history or life cycle of the 
bread mold, Rhizopus. 

We come to know a plant by carefully noting down its ex- 
ternal appearance and characteristics, as here in the 
case of bread mold. 

We will study how a plant grows, particularly how bread 
mold matures and reproduces itself. 

Bread mold does not have chlorophyll and represents & 
certain type of plant in the plant kingdom. 

There are two ways in which a plant reproduces itself: (a) 
sexually and (b) asexually. 

The plant whose reproduction, both sexual and asexual, 
we are going to consider is a fungus called Rhizopus, or 
bread mold. 

We will study its external appearance and its reproduc- 
tion, sexual and asexual. 


Low-kinetic-structure transcript 


1,2 
8, 4, 2 


3, 5, 2, 6,7 
2, 11, 3, 12 
3, 7, 8, 9 

3, 7, 8, 9, 10, 2 


4,7, 8,9 


Today we are going to study the history or life cycle of | 1,2 


bread mold, Rhizopus. 

We come to know a plant by carefully noting down its ex- 
ternal appearance and characteristics. 

We study plants by investigating how a plant grows, ma- 
tures, and reproduces itself. 

Bread mold is a plant. 

Reproduction is the formation of new individuals having 
characteristics typical of their family, thus bread molds 
reproduce bread molds as cats reproduce kittens. 

There are two types of reproduction: sexual and asexual. 

Sexual reproduction is brought about by the union of two 
nuclei in male and female sex cells. 


(Note to Table 2 appears on following page.) 


3,4 
3,5,6,7 


2,3 
7, 4, 64, 2, 65, 66 


7,8,9 
8, 7, 67, 47, 68, 69, 70 


868 


TABLE 2—Continued 


Note. Verbal element codes are as follows: 


1 History, life cycle 
2 Bread mold, Rhizopus 
à 3 Plant, plant body 
4 External appearance 
5 Grow, growth, develop 
6 Mature, ripe 
7 Reproduce, reproduction 
8 Sexual, sex 
9 Asexual 
10 Fungus, mold, rust 


high school participated in a study of the following 
factors which could have caused the subjects to 
fail in discriminating among the levels of kinetic 
structure in the transcripts. 

The instructions for the awareness questionnaire 
directed the subjects to compare the recorded 
transcript with all the lessons they have ever had 
in their experience. This might have been too gen- 
eral a comparison. In order to get an indication of 
the strength or weakness of this comparison, two 
biology classes were presented with the first half 
of the recording of the low-kinetic-structure tran- 
script (Bi = 22) and the first half of the high- 
kinetic-structure transcript (B, = 54). In one 
class, the low-kinetic-structure tape was played 
first and in the other class the high-kinetic-struc- 
ture transcript was played first. The students were 
directed to compare the taped transcripts to one 
another by using the same questionnaire as in the 
full-scale study. The results are reported in Table 
7. This direct comparison of the low- and high- 
kinetic-structure lessons did not enable them to 
discriminate the levels of structure. 

Another factor that might interfere with the 
subjects’ discrimination among the levels of kinetic 
structure is the content of the statements of the 
questionnaire. In order to gain more information 
concerning this factor, two more classes were pre- 
sented with the comparative evaluation task as 
outlined above. They were asked to listen to the 
first half of the two transcripts and then use the 
questionnaire of the full-scale study to compare 
them. Again the position of the transcripts in the 
treatment was alternated. This time before playing 
the tapes the students were given a brief intro- 
duction that explained commonality as the repeti- 
tion of words in statements next to one another. 


TABLE 3 


Mean SCORES or THE Content 
ACHIEVEMENT TEST 


i Mean test 
Transcript N yer 


SD 
High 65 14.26 3. 
Intermediate 63 13.71 P2 
Low 63 12.81 3.156 


SS Se ME 
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11 Chlorophyll 

12 Plant kingdom 

47 Nucleus 

64 Family 

65 Cat 

66 Kitten 

67 Union, unite, fuse 
68 Male, plus strain 

69 Female, minus strain 
70 Sex cell 


The students were told that the two lessons dif- 
fered on this point and that after hearing the 
tapes they would be asked to compare the lessons, 
The results are reported in Table 7, With this brief 
awareness training the students were able to dis- 
criminate the lesson kinetic-structure levels. 

The training procedure was repeated with two 
other classes but this time they only heard one 
tape, either the complete transcript of the low- 
kinetic-structure lesson or the complete tran- 
script of the high-kinetic-structure lesson. They 
were then asked to use the questionnaire of the 
full-scale study to compare the transcript with all 
the other lessons they had ever experienced. The 
results are reported in Table 7. With respect, to 
Statement A, students were even able to discrim- 
inate structure level by using the very general 
comparison of the full-scale study—all the lessons 
in their experience. 

All of these treatments were as similar to those 
of the full-scale study as possible, including the 
display of the Rhizopus diagram. The students were 
assigned to the various biology classes at random 
by a computer after they had been grouped by IQ 
and past performance in science. The 123 tenth- 
graders who participated in this study had an IQ 
range of 90-130, as measured by the Lorge-Thorn- 
dike test in the ninth grade, They had not yet 
studied the structure and life eycle of Rhizopus. 


Data Analysis 


The data of the content achievement test were 
used to test the hypothesis that student content 
achievement is directly related to lesson kinetic 
structure. 

The data of the awareness questionnaire were 
used to test the hypothesis that the awareness of 
commonality and theme development is directly 
related to lesson kinetic structure. The data were 
analyzed by performing a one-way analysis o! 
variance, I 

The data of the structure awareness training 
groups were evaluated by the t-test method. The 
f Ue level for the statistical tests was p S 


RESULTS 


_ The first hypothesis on knowledge acqui- 
sition was supported. The results of the 


LESSON STRUCTURE ANALYSIS 


TABLE 4 


ANALYSIS OF VARIANCE OF Mean Scores 
or THE CONTENT ACHIEVEMENT Test 


Source of variation SS df MS 
Between groups 68.633 2 34.317 
Within groups 2095. 128 188 11.144 

Total 2163.760 190 

Note. Between groups, F = 3.079, Fos = 3.05; 


af = 2/188, MS. 


analysis of variance are reported in Table 
4. The mean scores on the content achieve- 
ment test fall in the predicted order (see 
Table 3). 


y The second hypothesis predicted that the 


subjects would perceive the level of kinetic 
structure analysis of the transcript they 
heard by comparing its commonality to all 
the lessons in their experience. Tables 5 and 
6 show the results of this part of the study. 
Since both of the F values are below the 
stated acceptable level of p € .05, the sec- 
ond hypothesis was rejected. 

Table 7 contains the data of the structure 
awareness training study. The mean aware- 
ness value was computed by averaging the 
value assigned by the subjects to the two 
statements on the awareness questionnaire. 
The questionnaires were evaluated in the 
structure awareness training study in the 
same manner as was described for the full- 
scale study. Table 8 contains the results of 
the evaluation of the data of the structure 
awareness training study by use of the t-test 
method. It is evident from the means in 


TABLE 5 
AWARENESS QUESTIONNAIRE DATA 
Transcript N M E 

Questionnaire Statement A 
High 65 4.52 1.872 
Intermediate 63 4.16 1.879 
Low 63 4,24 2.123 
aou c d comet RE 

Questionnaire Statement B 
High 64 4.45 1.766 
Intermediate 63 3.88 1.802 
Low 63 4.51 1.466 


a Na o 
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Table 7 and the probability values in Table 
8 that giving the subjects a direct compari- 
son of the low- and high-kinetic-structure 
lessons did not enable them to diseriminate 
the levels of structure. The subjects were 
only able to perceive structure levels after 
structure awareness training. For the most 
part, they were even able to discriminate 
structure level by using the very general 
questionnaire item containing a comparison 
of the treatment lesson with all other lessons 
in their experience. 


TABLE 6 


ANALYSIS OF VARIANCE OF AWARENESS 
QUESTIONNAIRE Data 


Source of variance SS df MS 
(piu i a MEME ZEE 
Questionnaire Statement A 
GEM NER EL E T 
Between groups 4.718 2 2.359 
Within groups 722.557 188 3.843 

Total 127.275 | 190 
Questionnaire Statement B 
Between groups 15.054 2 7.527 
Within groups 531.162 187 2.840 
Total 546.216 189 


pee eres cS SS A 

Note. For Statement A, between groups, F = 
614; F.os = 3.05; total df = 2/188, MS. For 
Statement B, between groups, F = 2.650; 
Fos = 3.05; total df = 2/187, MS. 


Discussion 


Teaching is a nontangible phenomenon 
existing as a series of events from which we 
deduce certain conceptual categories called 
teaching acts. The identification of struc- 
ture in teaching is not as straightforward as 
in material structure analysis. Therefore, 
analysis of structure in teaching requires 
even more careful definition and precise 
empirical indicators to enhance realization 
of reliable and professionally significant 
findings. The concept of structure used 
herein is based on clearly observable events 
in teaching and therefore may not account 
for all of the subtleties in organization con- 
sidered significant in teaching. However, the 
added precision gained by careful empirical 
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definitions should yield enhanced reproduci- 
bility and precise application of findings to 
daily classroom teaching practice. The hy- 
potheses tested in this study concern very 
particular aspects of the teaching process. 
This particular focus, it is hoped, will pro- 
vide a clearer understanding of the finding 
than would be possible with more abstract, 
less empirical structure variable definitions, 
As reported in the results section, the first 
hypothesis was confirmed; namely, that stu- 
dent content achievement will be directly 
related to lesson kinetic structure, The pre- 
diction of the outcome of the first hypothesis 
was based on a theory developed from biol- 
ogy and psychology (Anderson, 1971). Bio- 
logically, the theory holds that the organi- 
zation of the nervous system .of higher 
organisms is such that the organism is max- 
imally sensitive to periodie stimulus input 
and that it forms relationships among simi- 


TABLE 7 


DATA OF THE STRUCTURE AWARENESS 
TRAINING STUDY 


Structure aw; 


treatment group and no, M 


SD | Variance 


Questionnaire Statement A 


1. No awareness train- 
ing; first half of low 
and high tran- 
scripts; low is sec- 
ond tape played. 

2. No awareness train- 

ing; first half of low 

and high tran- 
scripts; high is sec- 
ond tape played. 

Awareness training; 

first half of low and 

high transcripts; 
low is second tape 
played. 

4. Awareness training; 
first half of low and 
high transcripts; 
high is second tape 
played. 

5. Awareness training; 
only complete tape 
of low transcript 
played. 

6. Awareness training; 
only complete tape 
of high transcript 
played. 


4.33 2.91 


4.59 1.81 


e 


3.25 2.37 


5.79 


4.06 3.38 
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TABLE 7—Continued 


Structure awareness S 
treatment group and no. | u p 


Questionnaire Statement B 


1. No awareness train- 
ing; first half of low 
and high  tran- 
Scripts; low is sec- 
ond tape played. 

2. No awareness train- 
ing; first half of low 
and high  tran- 
scripts; high is sec- 
ond tape played. 

3. Awareness training; 
first half of low and 
high transcripts; 
low is second tape 
played. 

4. Awareness training; 
first half of low and 
high transcripts; 
high is second tape 
played. 

5. Awareness training; 
only complete tape 
of low transcript 
played. 

6. Awareness training; 
only complete tape 
of high transcript 
played. 


lar stimuli occurring in close temporal suc- 
cession. Psychologically, the theory holds 
that the learning process is enhanced when 
verbal communications contain linked ver- 
bal elements in contiguous verbal utter- 
ances thus producing proactive facilitation, 
that is, the facilitation of subsequent acqui- 
sition through the presentation of verbal 
elements that cue the learner to anticipate 
Tepetition and hence linkage of ideas. Com- 
munications of high kinetic structure con- 
tain more repetition and hence linkage of 
Contiguous verbal elements (commonality) 
than do communications of low kinetic 
Structure. The results of this study show 
that the amount of knowledge acquisition 
was directly related to the amount of struc- 
ture in the treatment lessons. One must re- 
alize, however, that too much commonality 
might cause a loss in acquisition of content 
ecause of boredom with excessive repeti- 
tion. Conversely, if there are too few com- 
mon verbal elements among continguous dis- 
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TABLE 8 


T-TEST ANALYSIS OF THE DATA FOR 
THE STRUCTURE AWARENESS 
TRAINING STUDY 


Structure awareness treatment 


group and no. df |e S05 


1. No awareness train- 
ing; first half of low 
and high tran- 
scripts; low is sec- 
ond tape played. A 

2. No awareness train- 
ing; first half of low 
and high tran- 
scripts; high is sec- 
ond tape played. B .24 | 45 

3. Awareness training; 
first half of low and 
high ^ transcripts; 
low is second tape 
played. A 


4. Awareness training; 
first half of low and 
high ^ transcripts; 
high is second tape 
played. B 35 |Signifi- 

cant 

5. Awareness training; 
only complete tape 
of low transcript 
played. 


> 
y 
S 


36 |Signifi- 
cant 


6. Awareness training; 
only complete tape 
of high transeript 
played. B 


-10 | 36 | ns 


course units the pupils may fail to associate 
the units in a series or expend so much work 
in supplying the links that they will become 
fatigued and knowledge acquisition will also 
be depressed. Therefore, in an efficient com- 
munication there must be a reasonable bal- 
ance between commonality and the progres- 
sive introduction of lesson content. 

The second hypothesis of this study, that 
awareness of commonality is directly re- 
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lated to lesson kinetic structure, was re- 
jected. However, the results of the structure 
awareness training study show that subjects 
can be taught to discriminate among levels 
of kinetic structure. The cursory training 
consisted of defining commonality according 
to the concepts represented in Figure 1 of 
this article. The findings from this part of 
the study extend our knowledge of kinetic 
structure analysis by providing some infor- 
mation about how lesson kinetie structure 
is viewed by the learner. Moreover, these 
findings shed some light on possible errors 
in student ratings of teacher performance— 
the students may not be sufficiently aware 
of the variables to assess them validly. 
Thus, even though the students suffer a 
learning deficit when structure is low; as 
shown in this study, they sometimes cannot 
identify the source of the instructional flaw 
without some training in the assessment of 
instructional procedures. Much further re- 
search is needed to determine what kind of 
assessment training and what maturation 
effects influence student assessment of sig- 
nificant teacher performance variables. 
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EFFECT OF TWO FREE-TIME REINFORCEMENT 
PROCEDURES ON ACADEMIC PERFORMANCE 
IN A CLASS OF BEHAVIOR PROBLEM CHILDREN! 


DAVID MARHOLIN IL, ELIZABETH T. McINNIS, ano TOM B. HEADS! 


University of Illinois at Urbana-Champaign and Herman M. Adler Center, 
Champaign, Illinois 


Three behavior problem children, ages 11-14, who had been placed 
in a state mental health facility, were subjected to baseline and two 
sequential reinforcement conditions (reading reinforcement and 
chance reinforcement). The results indicate that free time can func- 
tion as a reinforcer producing increases in the percentage of items 
answered correctly. Further, the results indicate that improvements 
can be affected in other academic subjects when only reading accuracy 
is reinforced. Finally, it was apparent that neither a reading reinforced 
nor a chance reinforcement condition produced consistently differential 
results, Results imply that a procedure which reinforces the qualitative 
aspects of academic performance can be effective in increasing the 
accuracy of academic performance. Moreover, these improvements in 
academic behavior were achieved using free time, a reinforcer which 


is inexpensive and readily available 
noted that a single 20-minute free-ti 


to the teacher. It should also be 
me period was effectively used to 


produce changes in three academic areas, 


Behavioral control techniques have been 
successful in increasing frequencies of task- 
oriented behaviors and decreasing incom- 
patible, disruptive behaviors in a variety 
of classroom populations, Madsen, Becker, 
and Thomas (1968) showed that by ignoring 
inappropriate behavior and attending to ap- 
propriate behavior, the inappropriate be- 
havior of three elementary school children 
was effectively diminished. Similar results 
were obtained in classes of first-grade be- 
havior problem children (Hall, Lund, & 
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Jackson, 1968; Ward & Baker, 1968), with 
institutionalized female offenders (Meichen- 
baum, Bowers, & Ross, 1968), and with a 
retarded child (Brown, Montgomery, & 
Barclay, 1969). A study involving 28 well- 
behaved children in a middle-primary pub- | 
lic school demonstrated that disruptive be- 7 
havior resulted when the teacher withdrew | 
positive attention for appropriate classroom | 
behavior and was eliminated when contin- 
gent positive approval was reinstated 
(Thomas, Becker, & Armstrong, 1968). An 
experiment with five hyperactive children 
in a special class demonstrated that visual 
orientation toward the teacher could be con- 
trolled by social reinforcement (Quay; 
Werry, McQueen, & Sprague, 1966). | 
Token reinforcement programs have ef- 
fectively increased the frequency of task- 
oriented behavior in classrooms of behavior 
problem children (O'Leary & Becker, 1967; 
O'Leary, Becker, Evans, & Saudargas, 
1969). Walker, Mattson, and Buckley 
(1969) increased the on-task behavior 0 
fourth-, fifth-, and sixth-grade disruptive ; 
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children. from a base rate of 3976 to a 
rate of 90% at the conclusion of a token 
reinforcement program. Broden, Hall, Dun- 
lap, and Clark (1970) demonstrated sim- 
ilar increases in on-task behavior from a 
baseline period (29%) to treatment con- 
ditions (74%) with seventh and eighth 
graders who displayed more than average 
amounts of disruptive behavior during 
baseline. In a series of single-subject ma- 
nipulations, Becker, Madsen, Arnold, and 
Thomas (1967) found that a combination 
of ignoring deviant behavior and reinforc- 
ing incompatible appropriate behavior was 
most effective in classroom management. 
Mattos, Mattson, Walker, and Buckley 
(1969) reported that reinforcement of ap- 
propriate behavior and punishment of de- 
viant behaviors with fourth-, fifth-, and 
sixth-grade behavior problem children was 
effective in producing behavioral change. 
Such studies have aptly demonstrated that 
attention, lack of attention, or punishment 
delivered contingently upon a response can 
be effective in altering the frequency of 
classroom behaviors. 

i Several studies have investigated changes 
in academic performance due to token re- 
inforcement systems. A 14-year-old cul- 
turally deprived juvenile delinquent in- 
creased more than two grade levels in read- 
ing achievement with an intensive one-to- 
one token program over a fvur-and-a-half 
month period (Staats & Butterfield, 1965). 
Nolan, Kunzelmann, and Haring (1967) 
reported similar results with eight junior 
high school students with serious learning 
and behavior disorders. Using a token rein- 
forcement system, Birnbrauer, Wolf, Kid- 
der, and Tague (1965) were successful in 
decreasing the percentage of errors in 10 of 
15 retarded subjects. Wolf, Giles, and Hall 
(1968) found significant increases in school 
grades of low-achieving fifth and sixth 
graders, as compared with a control group, 
when reinforcement was made contingent 
upon mastery of standard instructional ma- 
terials. In one of the largest token rein- 
forcement studies with emotionally dis- 
turbed children, Hewett, Taylor, and Artuso 
(1969) found increases in task-oriented be- 
haviors and standard achievement test 
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scores over those obtained by control groups. 
These studies demonstrate empirical sup- 
port for the effectiveness of token reinforce- 
ment programs to increase academic per- 
formance as a function of contingent 
payoffs. 

Most operant classroom studies, however, 
have concentrated on systematically in- 
vestigating the effects of reinforcement con- 
tingencies on disruptive and task-oriented 
behaviors. Classroom management tech- 
niques are a necessary precursor to a setting 
which is conducive to increasing academic 
skills and, indeed, many children are re- 
ferred for residential treatment because of 
social behavior problems in the school set- 
ting. However, the level of academic func- 
tioning of these children also is frequently 
found to be far below grade level. Thus, a 
program designed to accelerate academic 
behaviors incompatible with the referral 
problems is necessary. Establishing only ap- 
propriate social behavior (e.g. sitting at 
desk and holding a pencil), though neces- 
sary, is not regarded as sufficient, since these 
children cannot be returned to public schools 
unless academic as well as social deficits 
are remedied. Maximizing the probability 
of natural environmental reinforcers (Baer, 
Wolf, & Risely, 1968) requires that aca- 
demic behavior, as well as social behavior, 
be established. 

This study was designed to investigate 
two questions. First, is free time (as demon- 
strated by Osborne, 1969) a reinforcer, as 
indicated by its contingent effect upon ac- 
curacy of academic performance? Second, 
does reinforcement which is contingent upon 
reading performance produce an increase 
in other subject areas (mathematics and 
English) and if so, is the overall result of 
reinforcing one subject greater than the re- 
sult produced when reinforcement is un- 
predictably contingent upon performance in 
any one of the three subject areas? 


METHOD 


Setting 


The three subjects were residents in an in-pa- 
tient behavior modification program for emotion- 
ally disturbed children from 6 to 14 years of age. 
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They all attended special education classes in a 
school operated by the institution. A system of 
token reinforcement using marks which could be 
exchanged for a variety of back-up reinforcers was 
used in three classrooms. All six children in a 
classroom of four males and two females were in- 
cluded in the study. Their ages ranged from 11 to 
14 years. Five of the six children possessed a his- 
tory of acting out behavior in public school class- 
rooms prior to admission to the program. Their 
classroom problems typically included such be- 
havior as getting out of their seats without permis- 
sion, fighting, calling out, using verbal abuse, throw- 
ing objects, and refusing to work. The children 
were on the average three years below grade level 
as determined by norms for their chronological 
age. 


a certified special education teacher who 
isted ing ai 


Design 


An A-B-C-A 
(Sidman, 1960). Th 
baseline (A), reading reinforced (B), chance rein- 
forcement ( C), and a final baseline (A), 


Initial Baseline (A) 


The teacher maintained a record of the nu 
of items attempted and the number of ron sed 
lems for each child during the reading, mathemat- 
ics, and English periods. The final 20 minutes of 
the Morning school session was designated as the 


free-time period. During this time, students had 
Access to games, 


materials, special 


TABLE 1 


Mepran Accuracy Scores ror Eacan SUBJECT 
DURING INITIAL BASELINE 


Note. The median accuracy scor 


e. T d es served as 
the criterion levels required for each subject to 
earn free time. 
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with the teacher, teacher's aide, and other child 
Individuals earned 20 minutes of free time at 
conclusion of the morning classes if they we: 
i during the three a 

periods. Behaviors punished by time out 
classified into seven categories ranging on a e 
tinuum from minor offenses (eg., noncomp| 
talking without raising hand, moderate physii 
verbal abuse) to more serious ones (eg. ste 
assault on adults, major destruction, extortion 
Duration of time out was a function of the seriou 
ness of the offense, with times ranging from 10 mi 
utes for minor offenses to 50 minutes for the mos 
serious offenses. The time-out room, a short 
tance from the classroom, was 4 x 7 feet in 


and social stimuli, 
for minor offenses before the time-out c 


property). The use of time Out was consis 
through out all conditions of the study. 


Reading Reinforced (B) 


The median accuracy score for each child was 
determined from the data collected during the 
initial baseline (see Table 1). Free time wag 
earned by each child if during the morning reading 
period he attained or surpassed his median ac 
curacy score. The teacher continued to collect the 
rate and accuracy data in mathematics and Eng- 
lish, although no Specific contingency was based on 
these data. The teacher told each child what his 
Accuracy criterion was and gave him an example 
of how it was determined. Those children not earn- 
ing free time were given a 20-minute study hall 
Period. 


Change Reinforcement (C) 


At the conclusion of the morning academic 
periods and immediately preceding free time, each 


the three morning subjects were written. The card 
selected by the child designated the morning 
academic subject (English, mathematics, reading) 
upon which free time was contingent. The child 
to meet his accuracy criterion, which was de- 
termine from the initial baseline (see Table 1). 
The teacher told each child what his criterion was 
In each subject and explained to him how he would 
select the subject that would determine free time. 
during the chance reinforcement condition the 
child was absent from any of the academic periods, 
he drew from the remaining subjects. As in previ- 
ous conditions, those not earning free time were 
given a 20-minute study hall period. 


Final Baseline (A) 


st We Procedures in the initia] baseline were rein- 
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Data Collection Procedures 


The dependent measure used to evaluate the 
reinforcement procedures was the accuracy of each 
child’s performance in three academic subjects 
(reading, mathematics, and English), At the end 
of each 45-minute academic period, the teacher 
recorded the number of items on each child’s work- 
sheet with complete or partial answers. Unanswered 
items which preceded items with answers were also 
counted as attempted. Accuracy was then com- 
puted by dividing the number of correct answers 
by the number attempted and multiplying by 100. 
Children worked at their own rate on open-ended 
assignments. Each child was given material suitable 
to his educational level and was allowed to ask 
for the teacher's assistance if he did not under- 
stand the task, 


RESULTS 


Subject 1 exhibited a mean increase of 
20% over his baseline level of reading ac- 
curacy during the reading reinforced con- 
dition (see Figure 1). This gain in perform- 
ance was maintained during the chance re- 
inforcement condition. Institution of the 
final baseline condition did not affect the 
level of reading performance achieved dur- 
ing the two reinforcement conditions. Al- 
though the English answers were not rein- 
forced, English performance showed a mean 
increase of 33% during the reading rein- 
forced condition (see Figure 2). When the 
chance reinforcement condition was intro- 
duced, there was an initial decrease in 
English accuracy followed by an increasing 
trend which stabilized at a level higher than 
that achieved during the reading reinforced 
condition. Institution of the final baseline 
produced a decrease in the level of English 
performance observed during the two rein- 
forcement conditions. The reading rein- 
forced condition also produced a mean in- 
crease in mathematics accuracy of 40% 
(see Figure 3). During the chance reinforce- 
ment condition, mathematics performance 
appeared to stabilize at a somewhat higher 
level than was observed during the reading 
reinforced condition. Mathematics accuracy 
decreased slightly and showed large varia- 
tions in the final baseline. The mean values 
for each subject and condition are presented 
in Table 2. 

The results for Subject 2 also indicated 
that there was a slight increase in reading 


PERCENT OF ITEMS CORRECT 


SESSIONS 


Ficure 1. Percentage of reading questions an- 
swered correctly during each session. (Abbrevia- 
tions: A = baseline, B = reading reinforced, and 
C = chance reinforcement.) 


accuracy during the reading reinforced con- 
dition which was not maintained during the 
chance reinforcement condition (Figure 1). 
The downward trend in performance con- 
tinued during the final baseline. As with 
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PERCENT OF ITEMS CORRECT 


iU MULTIS 


SESSIONS 


Figure 2. Percentage of English questions an- 
swered correctly during each session. (Abbrevia- 


tions: A = baseline, B = reading reinforced, and 
C = chance reinforcement.) 


Subject 1, English performance increased 
over the initial baseline level during the 
reading reinforced condition (see Figure 2). 
Institution of chance reinforcement initially 
maintained English accuracy at a high level 
with increased variability during the latter 
half of the condition. As with reading, these 
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gains were not maintained during the 
baseline condition. As noted with En 
mathematics performance increased duri 
the reading reinforcement condition to 
near perfect level (see Figure 3). Institu Lio 
of chance reinforcement produced an im 
mediate decrease in accuracy to a lowe 
and more variable rate than that obsery 
during the reading reinforced condition 
There was some decrease in performang 
when the final baseline was instituted, | 
As with the other two children, Subject § 


PERCENT OF ITEMS connect 


sessions 

Ficure 3. Percentage of mathematics problems 
answered correctly during each session. (Abbrev 
tions: A = baseline, B = reading reinforced, an 
C= chance reinforcement.) 


TABLE 2 


Meran Accuracy Scores FOR Eacu SUBJECT 
DuniNG Eacu CONDITION 


" Chance | Base- 
Academic subject | Baseline 1| Reading, reinforce- | line 
Subject 1 
Reading 63.8 84.1 86.6 | 83.0 
English 45.7 79.0 85.2 | 73.9 
Mathematies | 49.5 89.0 85.3 | 74.7 
Subject 2 
Reading 89.6 93.0 91.5 | 81.0 
English 84.5 85.5 93.0 | 81.0 
Mathematics | 89.8 97.5 85.4 | 79.0 
Subject 3 
Reading 75.8 97.5 97.5 | 91.9 
English 96.0 92.4 88.2 | 87.6 
Mathematics | 91.4 92.8 93.4 | 84.4 


exhibited an increase in reading accuracy 
(22%) during the reading reinforced con- 
dition (see Figure 1). This increase was 
maintained in the chance reinforcement con- 
dition. Some disruption in performance was 
observed during the final baseline. English 
performance was lower than the baseline 
level in the reading reinforced condition. 


WE -BASELINE 

READING REINFORCED 
£ -CHANCE REINFORCEMENT 
Ez 


FINAL BASELINE 
100 


80: 


40: 


20 


MATH RDG. 


CHILD ! 
Ficure 4. Percentage of sessions in whic 
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CHILD 2 
h the child equaled or surpassed his criterion 


level. (Abbreviations: Rdg. = reading, and Eng. = English.) 
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Institution of chance reinforcement initially 
had no effect, but some increase in perform- 
ance was observed during the latter half of 
the condition. However, this gain was not 
maintained during the final baseline con- 
dition (see Figure 2). During the reading re- 
inforced condition, mathematics accuracy 
initially became variable but then stabilized 
at a somewhat higher level than that 
achieved during the initial baseline (see 
Figure 3). The level of performance, largely 
maintained during the chance reinforce- 
ment condition, declined during the final 
baseline. 

The percentage of the sessions during each 
condition in which the child achieved his 
criterion level was also analyzed (see Figure 
4). Subject 1 achieved his criterion level in 
reading more frequently during both rein- 
forcement conditions than he did during 
either baseline. This result was also con- 
sistent for the other two academic subjects. 
Criterion levels of mathematies and English 
performance were achieved more frequently 
by Subject 2 during both reinforcement 
conditions than during baseline. During 


CHILD 3 
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TABLE 3 
Mean NUMBER or Items ATTEMPTED DURING 
THE INITIAL BASELINE AND THE 
REINFORCEMENT CONDITIONS 


Academic subject | Baseline | Reading, | Chance rein- 
Subject 1 

Mathematics 11.2 31.3 29.3 

Reading 26.5 34.8 35.1 

English 23.3 28.6 18.6 
Subject 2 

Mathematics 98.0 121.5 42.5 

Reading 11.4 23.8 31.4 

English 35.6 22.8 23.7 
Subject 3 

Mathematics 38.3 44.9 44.9 

Reading 20.2 17.7 22.3 

English 43.4 43.8 37.2 


reading, however, the criterion level was 
achieved less frequently during both rein- 
forcement conditions than it was during 
the initial baseline. Subject 3 achieved his 
criterion level in mathematics and reading 
more frequently during both reinforcement 
conditions than during baseline. Achieve- 
ment of his English criterion was less fre- 
quent during both reinforcement conditions 
than during initial baseline. It should be 
noted that the reading criterion for Subject 
2 was 95% and the English criterion for 
Subject 3 was 98%. It may be that the use of 
an extremely high criterion level adversely 
affected performance in these subject areas, 
To determine whether the reinforcement 
conditions produced any decrease in the 
children’s work rate, the mean number of 
items attempted during each condition was 
computed for each child (see Table 3). 
These data indicate that there was no sig- 
nificant decrease in rate when accuracy was 
reinforced. In many cases there was an in- 


crease in rate during the reinforcement con- 
ditions. 


Discussion 


The results of this study indicate that 
free time can function as a reinforcer pro- 
ducing increases in the percentages of items 
answered correctly. Response generalization 
during the reading reinforced condition was 
observed with both Subject 1 and Subject 
2. In the case of Subject 1, generalization 


D. MARHOLIN, E. T. McINNIS, AND T. B. HEADS 


was observed in both English and mg 
maties, while generalization was no 
mathematics but not in English for Subj 
2. The lack of any generalized impro 
ment in Subject 3 may have been due 
his very high baseline levels in both Engl 
and mathematics (96% and 91%, reg 
tively), which in effect left little room 
further improvement in these areas." 
occurrence of response generalization is 
particular interest, since the literature 
porting generalization to similar but no 
reinforced responses is limited (Lova 
Koegel, Simmons, & Long, 1973; Schu- 
maker & Sherman, 1970; Wheeler & Sulzer, 
1970). Finally, it was apparent that neith 
the reading reinforced nor the chance rei 
forcement condition produced consistently: 
differential results. 
These results imply that a procedure 
which reinforces the qualitative aspects of 
academic performance can be effective in 
producing increases in academic accuracy. 
Moreover, these improvements in academic 
behavior were achieved using free time, & 
reinforcer which is inexpensive and readily 
available to the teacher. It should also be 
noted that a single 20-minute free-time pe- 
riod was effectively used to produce changes 
in three academic areas. 1 
The fact that the mean number of items 
attempted did not systematically decrease 
during the reinforcement conditions is some- 
what unexpected. It seems reasonable to 
expect that if the child were concerned with 
maximizing his chances to receive the rein- 
forcer, he would attemp) fewer items to 
make sure that those he did were correct. 
This, however, did not seem to be the case. 
About the same number of items was at- 
tempted during reinforcement, but more 
were done correctly. Thus, with the too 
upon accuracy, the resulting improvemen 
was not achieved at the expense of speed. 
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A TEXT WITH ILLUSTRATIONS 
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Seventy-one fourth graders read an adapted version of a “Rupert 
Bear” story. In the experimental condition (text with illustrations), 
pictures and text occasionally provided more or less contradictory in- 
formation, At retention testing (immediately, after a day, or after a 
week), the experimental condition produced higher scores than the 
control (text without illustrations) for questions concerning exclu- 
sively pictorial information and for questions concerning correctly il- 
lustrated text contents. For questions concerning incongruously 
presented information, the experimental subjects selected more multi- 
ple-choice alternatives Tepresenting picture input, while controls pre- 
ferred alternatives representing textual input. No differences were 
found for questions covering unillustrated text contents. Neither read- 


ing time nor imagery ability were related to retention, 


In view of the enormous number of illus- 
trated texts in use (e.g., in education), 
surprisingly few experimental studies have 
been performed to investigate the effect of 
illustrations on learning from reading ma- 
terial. Notable exceptions are some studies 
by Vernon in the early fifties (e.g., Vernon, 
1954), which showed hardly any effect, and 
a recent series of experiments by Dwyer 
(for a Summary, see Dwyer, 1970), who 
repeatedly came to the plausible but vague 
conclusion that some types of visuals are 
more effective than others in facilitating 
student achievement of specific educational 
objectives. 

One necessary and obvious approach to 
this area of study, as yet hardly explored, 
is the careful distinction between informa- 
tion present (a) exclusively in text, (b) 
exclusively in illustration (s), or (c) in both 
and the investigation of the fate of these 
three categories of information when re- 
tention is tested. Apart from the possibility 
of furthering insight into the kinds of ef- 
fects illustrations may have, this line of in- 

vestigation has the additional advantage 
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of profiting from recent findings in other 
fields of research where pictures are used as 
learning material. 

From these findings, then, certain effects 
of illustration seem likely. Many recent 
Studies (e.g Nickerson, 1965; Shepard, 
1967; Standing, 1973) have shown that 
(complex) pictures are surprisingly well re- 
tained. Though most of these studies dealt 
with recognition of pictures as a whole 
rather than, for instance, with recall of 
elements from the pictures, it seems plaus- 
ible to assume that at least some of this 
impressive retention may also be found in 
recall tasks. " 

When retention of pictures and corre- 
sponding names. of common objects are 
compared, pictures again are usually found 
to do surprisingly well (i.e., they are better 
retained than the words), this time both on 
Tecognition and recall measures and on 
various kinds of learning tasks (for a re- 
View, see Paivio, 1971). In the present con- 
text, however, two findings somewhat con- 
fuse the issue. First, some studies (Du- 
charme & Fraisse, 1965; Jenkins, Stack, & 
Deno, 1969; however, see also Fraisse, 1970) 
have shown that picture recall may not be 
as impressive when children are used as sub- 
jects. A second difficulty is raised by the 
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outcome that, though pictures are generally 
found to be better retained than words, the 
relative retention of pictures and words, in 
comparison to either of these alone, is not 
altogether clear. Some authors (e.g., Bous- 
field, Esterson, & Whitmarsh, 1957; Du- 
charme & Fraisse, 1965) report better re- 
tention of items presented in both modes 
than of items presented verbally, but a re- 
cent study of Davies, Milne, and Glennie 
(1973) indicates that this may not gen- 
erally be found. 

In spite of these obscurities and on the 
basis of the experiments mentioned so far, 
one may expect by and large that some re- 
tention of pictorial content and some facil- 
itation in retention of pictured text ele- 
ments may both be obtained when memory 
of an illustrated instead of an unillustrated 
text is tested. 

Predicting the fate, at retention testing, 
of text elements not represented in the il- 
lustrations is less feasible on the basis of 
existing research results. On the one hand 
one could argue that text contents, while 
not featured in the pictures, might still be- 
come associatively connected to the illustra- 
tions, thus creating a kind of paired-as- 
sociate paradigm where the illustration 
functions as a stimulus term and the text 
elements as response term(s); the illustra- 
tion could, in this view, be considered as a 
kind of “conceptual peg” (Paivio, 1969). 
Since various authors (for a review, see 
Paivio, 1969) have shown that in such a 
set-up, words as response terms are better 
retained than the same terms in a design 
where pictures are replaced by correspond- 
ing words, one might accordingly expect 
some facilitation of the retention of the text 
elements concerned, in comparison to the 
- Same elements in a text without any illus- 
trations. Some support for this facilitation 
can actually be found in a study by Ket- 
cham and Heath (1962), though the claims 
made by these authors as to the significance 
of their results are not valid since their 
analysis of variance was not followed by 
individual comparisons between means. 

A contrary conclusion, however, might 
also be reached: Illustrations might well be 
a source of distraction (cf. Samuels, 1970), 
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or they might differentially favor elements 
from the text that are represented in the 
illustrations, at the cost of the remaining 
text elements—a phenomenon similar to the 
effect prequestions may have on the re- 
tention of question-irrelevant text content 
(Peeck, 1970). 

Both expectations are not mutually ex- 
clusive, however. In an illustrated text, some 
unillustrated elements from the text may 
associatively benefit from the illustrations, 
while the facilitation of these (and of the 
illustrated elements from the text) may be 
attained at the cost of other, unillustrated 
elements. 

To study the issues indicated above, sub- 
jects in the present experiment were given 
either an illustrated or an unillustrated 
text, while the retention test contained ques- 
tions on elements exclusively present either 
in the pictures, or in the text, or present in 
both pictures and text. To gain some in- 
sight into the relative contribution of text 
and pictures on recall, the learning material 
was so constructed that on some occasions 
text and picture provided the reader with 
more or less conflicting information; corre- 
spondingly, retention test items (multiple 
choice) were so designed that a subject 
could base his answer on the information 
given by either the text or the picture. 

The learning material consisted of an 
adapted version of a strip cartoon. In order 
to secure as natural a reading process as 
possible, subjects (9-10-year-old children) 
were not given learning instruction nor were 
they prepared for a retention test. The time 
they took to read the story was recorded so 
that a possible effect of the presence of 
illustrations on reading time might be ob- 
served. In order to study the regularity of 
the reading process, the progress of the 
children in reading the story was registered 
at fixed intervals. 

Testing for retention took place (a) im- 
mediately after completion of the story, 
(b) after one day, or (c) after a week. In 
this way it was hoped to gather some data 
on the course of forgetting of verbal and/or 
pictorial content, which, as some studies 
have suggested, may differ (ef. Bahrick & 
Boucher, 1968; Shepard, 1967). 


882 
Mertxop 


Material 


The learning material was based on the strip 
cartoon “Bruintje Beer in Dromenland” (“Rupert 
Bear in the Land of Dreams”). This story seemed 
a suitable basis for the investigation for several 
reasons: The pictures are fairly simple yet con- 
tain sufficient and sufficiently varied informa- 
tion, and the text constitutes the actual story 
which can easily be read without the illustrations. 

Originally the story consisted of 59 drawings, 
each of which was accompanied by a text unit of 
50-100 words. Of these drawings and text units, 37 
were selected for the experiment. The texts were 
then adapted so as to result in a story with a 
smaller range in the number of words (70-90) ac- 
companying each drawing. Minor changes in ac- 
cordance with the objectives of the experiment 
were then introduced in the text so that pictures 
plus text contained four categories of information: 
(a) exclusively provided by text (T); (b) ex- 
clusively provided by pictures (P); (c) congru- 
ously provided by text and picture, that is, ele- 
ments from the text correctly pictured in a draw- 
ing (P + T); and (d) incongruously presented by 
text and pictures (P X/ T). 

The last category contained, on the one hand, 
instances where, though text and accompanying 
picture did not agree, the information provided 
by the one was not logically incompatible with 
information given by the other (P/T); for exam- 
ple, on one oceasion it is stated in the text that 
Rupert Bear is walking through a wood, whereas 
the accompanying picture shows him walking 
along a country road: The data seem contradic- 
tory, but may, with a little good will, be combined ; 
after all, text and picture may deal with different 
instances of Rupert's behavior. In some other cases, 
however, the information in the picture was logi- 
cally virtually irreconcilable with the content of the 
text (P x T); for example, when, according to the 
story, a certain character (the hippopotamus) per- 
forms a unique action (he pulls the tablecloth off a 
ie se whereas the accompanying picture 
shows the same action being ex MEON 
elec (the king). Ripe gi wan 

e 37 text units, thus (re)construe 

put under the 37 pictures, xeroxed and win 
new booklet. The unillustrated text units were 
made into a similar new booklet, Forty multiple- 
choice questions, each with 4 alternatives, were 
constructed. There were 8 T questions, 9 P. ques- 
tions, 8 P + T questions, and 15 P X/ T questions 
(7 P/T and 8 P x T). 

In order to find out the extent to which subj 
based their responses on textual or Minute ae 
mation, respectively, the 15 P x; T questions con- 
tained both an alternative Corresponding to the 
text and an alternative corresponding to the pic- 
ture. 


Care was taken to spread the questions from 
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the various categories as evenly as possible o 
the 37 picture-text units. 

All children were given in advance a 
reading ability (a subtest of a Dutch intel 
test, the Interesse-Schoolvorderingen-Intelli 
Reeks by Snijders, Souren, & Welten, 1968) 
test of imagery ability, the Memory for 
Test designed by Graham and Kendall (1960; ef, 
Vesta, Ingersoll, & Sunshine, 1971); this mad 
possible to check the equivalence of the condi 
within each school. The Memory for Designs 
in which the subject has to draw figures shown 
five seconds from memory, was chosen becaus 
requires the subject to hold a percept in memori 
(an activity possibly related to retaining informa 
tion from illustrations), without the necessity. 
manipulate (e.g. rotate) it. The items of the te 
were enlarged and copied on cards for group preset 
tation. Two independent judges scored the repro 
duetions according to the directions given bj 
Graham and Kendall; interrater reliability for al 
schools together was 85. For each subject, tht 
scores given by the two judges were added togethe 
and the resulting scores were used for further come 
putations. 


Subjects and Design 


Subjects were 71 9-10-year-old children in three 
fourth grades of three elementary schools (A, B 
and C). In each school, about half the child 
(the experimental condition) were presented with 
the illustrated text, while the other half (the cone 
trol condition) read the text without pictures. The 
children were randomly assigned to the conditio 
with the restriction that within each school al 
attempt was made to get about an equal male 
female ratio in each condition. The children in 
School A were tested for retention immediately 
after they had read the material; in Schools B 
and C, unexpected testing took place after onë 
day and one week, respectively. 

Neither the reading test nor the Memory fot 
Designs Test showed significant differences between | 
the conditions within a school; on both tests, how- í 
ever, the children in School B obtained generally 
lower scores than the children in either of the 
other schools, l 

On the retention test measures, 3 X 2 (Schools 
X Conditions) analyses of variance were pet- 
formed; possible differences between the schools 
were not further analysed, since, due to the con- 
founding of the time variable with possible 
ferences in intellectual level between the 
they would be hard to interpret. Differences be 
tween experimental and control conditions were d 
tested by t tests, one-sided for the scores dealing 
with retention of purely pictorial material (Pin 
Table 1, and P alternatives in Table 2) and two 
sided for all other scores. 


Procedure 


The experiment was run by two experimen 
working simultaneously in order to make the dw 
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tion of the experiment in a school as short as pos- 
sible and to prevent the children from discussing 
the learning material. The children were tested in 
groups of 4 or 5, homogeneous as to the condition; 
they were seated at separate tables and told they 
would be given a booklet to read the way they 
would do it when reading for pleasure at home. It 
was stressed in the instruction that there was no 
need to hurry, everybody could read at his own 
rate. The children were then given a booklet and 
an empty sheet (to draw on if they had finished be- 
fore the others), and they were told they could 
start reading. The experimenter noted at two-min- 
ute intervals which page of the booklet each child 
had reached. 

After all the children had finished the story and 
the booklets had been collected, the children were 
asked, by means of a questionnaire, if they had 
read "Rupert Bear" books before, and if they 
knew this particular one. (On the basis of the 
responses to these questions, six children who pre- 
sumably had read the story before were removed 
from the original sample.) 

Sheets for the 40-item retention test were then 
handed out to the subjects in School A; in Schools 
B and C, testing.took place after one and seven 
days, respectively. The test was introduced as a 
test of “what you remember of what you have 
read”; the multiple-choice principle was explained 
to the children, and they were asked to answer all 
questions and to guess if they did not know the 
answer, 


RESULTS 


Table 1 deals with the retention of un- 
illustrated (T) and correctly illustrated 
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(P + T) text contents and of information 
exclusively shown in the pictures (P). 

In all schools, retention of T contents ap- 
pears to be somewhat higher in the experi- 
mental than in the contro] condition, though 
the overall effect is not significant (F — 
2.30, df = 1/65, 10 « p < .20). The pres- 
ence of the illustrations thus seems to have 
had a slight facilitative effect on the re- 
tention of unillustrated elements from the 
text. The differences between the schools 
are significant (F = 3.25, df = 2/65, p < 
.05), but no further analysis or interpreta- 
tion of this finding will be attempted, for 
reasons stated above; there was no Schools 
x Conditions interaction (F < 1.0). 

Subjects in the experimental condition 
retained significantly more P + T contents 
than subjects in the control condition (F= 
11.37, df = 1/65, p < .005). As Table 1 
shows, further analyses within the schools 
reveal significant differences in Schools B 
and C but no significant difference in School 
A. This suggests that the facilitative effect 
of illustration on the retention of text ele- 
ments is primarily a phenomenon of delayed 
retention testing. Here too the differences 
between the schools were significant (F = 
5.49, df = 2/65, p < .01), whereas the inter- 
action was not (F = 1.38, df = 2/65, ns). 

Finally, the P questions were, as could 


TABLE 1 


Mean Rerention AND SrANDARD Deviations oF U: 
AND CORRECTLY ILLUSTRATED TEXT 
TESTING AFTER ONE DAY, AND TESTING A: 


Contents (P), 
TESTING, 


School A (immediate) 


ILLUSTRATED TEXT CONTENTS (T), PICTURE 
Conrents (P + T) AT IMMEDIATE 
FTER OnE WEEK 


School B (one day) School C (one week) 


Ee 


Contents [J D 
Exp Con | difference} Exp Con |¢ difference} Exp Con |+ difference 
" 10 nu 16 i 9 nu 
^ 5.22 | 4.55 
x 6.00 | 5.45 4.81 | 4.2 ; : ie 
m 118 | 130 | 2:0 | 150 | 1.86 95 | i80 | 1.67 
x 4.36 6.25 | 3.29 v | 6.56 | 8.00 | og 
ae $49 | tog | 285% | 1.56 | 1.03 6.00* | 2106 | 1.95 
+T 
x 4.70 | 4.27 5.44 | 3.43 | ger | 3-89 | 209 | 249" 
SD 14 | 1.42 | 9 | 1.94 | 1.92 E 1.82 | 1.08 


Note. Abbreviations: Exp — text with illustration 


LP < 05. 
p < Q0. 


s; Con = text without illustrations. 
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Eae Te (T) Pictures (P), on Mu 
ALTERNATIVES CORRESPONDING TO TEXT OR TO ? 
Sr poesie DEALING WITH INFORMATION PRESENTED More (P X T) or Less 
(P/T) INcoNGRvoUsLY BY TEXT AND PICTURES 
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School A (immediate) School B (one day) School C (one week) 
bsec ^ Exp Con  |(/diference| Exp Con difference} Exp | Con |t diffe 
P/T 
fictile Xx 2.90 | 1.09 | 5 75€ | 2.56 | 1.14 | 4.29% 2.56 | 1.27 
SD | 1.30 | .90 | * 86 | e 126 rm 
150 | 4.09 331 | 4. 38 | 3. 
TS 1n iso | 150 | -9 | ^98 | 156 | 2-78 | “coy | 02 
PXT B 
i X |a | 1.27 $80 | 2.79 |; oor | 478 | 2-63 |a gg 
di BD) eus ro 5 on 1:88) 61:91. |. 1-90 1.03 | 98 
3.00 | 4.64 275 | 2.70 arx 
xe sp | 1.18 | 1.07 |") 58s | 115 | 79 | 1:52 | 98 
Total P x/T 
i .00 | 2.36 6.38 | 3.93 7.33 | 3.64 | 4 am 
pie p^ 2n | cer |^"| 25 | rm |*9" zn | 1.61 p 
" 6.80 | 8.73 6.00 | 6.93 i 5.36. | 4.99 
a SD | 2.08 | 2.09 | 27 | reo | 2:31 |} | 178 | 1.49 
Note. Abbreviations: Exp = text with illustrations; Con = text without illustrations. 
*p < 05. 
** p « 01. 


expected, answered correctly significantly 
more often in the experimental than in the 
control condition (F = 37.12, df = 1/65, 
p < .001), a superiority found, as the t 
tests show, in all schools. Differences be- 
tween the schools were not significant (F = 
1.29, df = 2/65, ns) nor was the interaction 
(F = 1.97, df = 2/65, ns). Interestingly, 
the retention of pictorial content shows, as 
can be seen in Table 2, little deterioration 
as the interval between learning and testing 
increases. However, as was said before, due 
to the lack of equivalence of the schools, 
this finding should be viewed with caution. 

Table 2 shows the result on items dealing 
with information more (P x T) or less 
(P/T) incongruously presented in text and 
picture. 

On both the P/T and P x T questions, 
subjects in the experimental condition chose 
significantly more often than control sub- 
jects the alternative corresponding to the 
picture content (P alternative) (F = 30.68 
and F = 37.64, respectively, with df = 1 /65 
and p < .001 in each ease). There are no 
signifieant differences between the schools 
(F < 1.0 and F = 2.36, respectively, with 
df = 2/65), nor were the Schools x Con- 


ditions interactions significant (F < L 
and F = 2.54, with df = 2/65). As can be 
seen in Table 2, in all schools, t tests be: 
tween experimental and control conditions 
were significant on at least the .05 level. 
On the alternatives corresponding to the 
information provided by the text (T alter 
natives), a significant difference between 
the conditions was only found for the P 3 
T questions (F = 6.98, df = 1/65, p < 0); 
for these T alternatives, there were highly 
significant differences between the schools 
(F = 9.78, df = 2/65, p < .001), while the 
interaction also reached significance (F = 
336, df = 2/65, p < .05). As is shown ^ 
Table 2, only on immediate testing (Scho? 
A), control subjects chose significantly we 
T alternatives on P x T questions. On E 
other category of questions (P/T), the di 
ference in choice of T alternatives between 
experimental and control conditions be 
almost significant (F = 3.38, dj = 1/ i 
05 < p < .10) ; here neither the differente 
between the schools nor the interaction W 
significant (both times F < 1.0). T 
So far the results on the P/T and P X 
questions have been treated separate 
Since, as was explained above, these cA’ » 
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gories differed in the degree of conflict be- 
tween textual and pictorial information in- 
volved. However, as it is not known whether 
subjects actually experienced this distinc- 
tion, it is perhaps advisable to also analyse 
these categories together (PX/T, see Table 
2). Analysis of variance of the P alterna- 
tives then reveals a highly significant dif- 
ference between the conditions (F = 56.81, 
df = 1/65, p < .001); neither differences 
between the schools nor interaction reach 
significance (F = 1.10 and F = 1.77, respec- 
tively, with df = 2/65). The t test of the 
difference between the experimental and 
control conditions is significant beyond the 
01 level for each school (see Table 2). On 
the T alternatives, a significant difference 
between the conditions is found (F = 7.82, 
df = 1/65, p < .01); this time the schools 
differ significantly as well (F = 4.40, df = 
2/65, p < .05), while there is no significant 
interaction (F < 1.0). Additional ¢ tests 
show that on immediate testing only, a sig- 
nificant difference between the conditions is 
obtained (in favor of the control condition). 

Taken together these results indicate that 
the exposure of subjects in the experimental 
condition to both pictures and text had a 
powerful effect on their behavior on the re- 
tention test: They chose the alternative 
corresponding to the pictorial information 
far more frequently than the subjects who 
had only seen the text, for whom, it should 
be realized, the P alternatives were merely 
one of three (incorrect) response alterna- 
tives. This preference is probably respon- 
sible for the results on the T alternatives, 
which the experimental subjects selected less 
frequently than the control subjects. 

It is worth noticing further that with in- 
creasing retention intervals, the frequency 
of selection of P alternatives remains about 
the same, whereas the frequency of selec- 
s of T alternatives shows a steady de- 
cline. 

_ The children showed considerable varia- 
tion in reading time. In two of the schools, 
Subjects in the experimental condition com- 
Pleted the story on the average somewhat 
faster than the controls and in the third 
school somewhat slower. However, none of 
these differences was significant (in Schools 
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A, B, and C, t = —1.85, 1.66, and —1.39, 
respectively, all ps > .05). There is, there- 
fore, no indication that the presence of the 
pictures affected the inspection rate of the 
children in any systematic way. 

In order to explore the regularity of the 
reading rate over the story, in both con- 
ditions in each school, product-moment cor- 
relations were computed between total read- 
ing time and the number of pages read after 
8 minutes. This point was chosen because 
the average reading time of all children was 
about 16 minutes, and it therefore could be 
assumed that after 8 minutes the subjects 
were about halfway through the story. For 
the experimental and control conditions, re- 
spectively this correlation was as follows: 
School A, —.86 and —.97; School B, — 82 
and —.83; and School C, —.92 and —.88. In- 
speetion rate thus appears to have been 
very regular and, in this respect too, it 
seems unaffected by the presence or ab- 
sence of pictures. 

Correlations were also computed between 
the various retention measures on the one 
hand, and reading time, the scores on the 
reading test, and the scores on the Memory 
for Designs Test on the other. No systematic 
correlations were found for either reading 
time or imagery scores. Only the scores on 
the test of reading ability showed consist- 
ently positive and somewhat substantial 
correlations with scores on T questions, and 
T alternatives on PX /T questions. 


DISCUSSION 


The results of this experiment clearly 
show that the presence of illustrations had 
several effects on retention: On questions 
covering exclusively pictorial content (P) 
and on questions dealing with information 
uniformly presented by both text and il- 
lustrations (P + T), subjects who read the 
illustrated text scored significantly higher 
than subjects who read the text without il- 
lustrations; on items covering information 
incongruously presented by text and illus- 
tration (P/T and P X T), they checked 
the alternative corresponding to the pictorial 
content far more frequently than the sub- 
jects who only read the text, whereas the 
latter tended to select the alternative corre- 
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sponding to the text more often. Though 
no significant difference was obtained for 
items testing retention of information pre- 
sented through the text only (T), an indica- 
tion of some facilitation in the illustrated 
text condition was found in all schools. 

In relating these results to educational 
practice, it should be noted that the learn- 
ing material used was probably more con- 
ducive to pictorial retention than illustrated 
textbooks used in education generally are. 
Though the text of this strip cartoon carried 
the actual story, the pictures continuously 
provided the reader with some visualization 
of its contents, thus reinforeing the tend- 
ency, already natural in reading strip car- 
toons, to pay attention to the illustrations. 
As a result, the retention of the pictorial 
material in the present experiment may be 
somewhat inflated. However, there seems 
as yet little theoretically founded reason 
to expect radically different phenomena 
when retention of illustrated textbooks (for 
this age group) is investigated with methods 
similar to the ones applied in this study, 
although the effect of illustration will prob- 
ably turn out to be less marked. 

Due to differences between the schools, 
no definitive conclusions can be drawn con- 
cerning the existence of systematic differ- 
ences in the course of forgetting of pictorial, 
as compared with verbal content. There are, 
however, several indications in the data that 
retention of pictorial content might suffer 
less from an increase of the interval be- 
tween learning and testing than retention 
of verbal content. This would be in agree- 
ment with results obtained in the studies, 
referred to above, which showed the im- 
pressive, and relatively stable, long-term 
retention of pictures. 

The findings on the items dealing with 
incongruously presented information re- 
quire some further comment. They show 
that subjects in the illustrated text con- 
dition tended, on the average, to select more 
alternatives congruent with the pictures 
than alternatives congruent with the text. 
This raises the question whether subjects, 
when selecting P alternatives, were con- 
sciously basing their answers on pictorial 
information, and whether they were (either 
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on testing or before, while reading) aware 
of contradictions between text and pictures, 
In this respect, it should be noted that the 
children in the illustrated text condition 
were relatively free to base their answer on 
either source of information, relatively free 
because the retention test was introduced 
as a test “to see what you remember 0 
what you have read.” Furthermore, in edu: 
cational practice, pictures in textbooks often 
tend to be neglected as a legitimate source 
of knowledge, and testing often deals exclus 
sively with verbally presented information? 
It may therefore be argued that, also bes 
cause of the scholastic context, the children 
in the present experiment considered the re 
tention test as primarily a test of their res 
tention of the text. Accordingly, the chil- 
dren in the experimental condition probably 
tried to base their answer either on what 
they thought they had read in the text, or 
in the ease of conscious reference to the 
pictures, assumed that text and pictures had 
provided the same information. This would 
imply that little awareness of contradiction 
was raised in initial learning, or, if raised, 
was on testing largely forgotten or over 
ridden, due to the superior retention of 
pictorial content. This interpretation is sup- 
ported by the observation that the children 
did not show any sign of awareness of the | 
presence of conflicting information in the 
reading material, either on initial reading | 
or on testing. 

In order to obtain more data on these) 
matters, several lines of investigation may 
be tried. One line, for example, would be to 
emphatically instruct the children to base” 
their answers on the retention test exclu- 
sively on information provided by the text. 
Another line would be to see whether sub- 
jects are capable of distinguishing between 
pictures and text as the source for answering 
test items; this might give some indication 
of the way the information from both | 
sources is stored (cf. Sasson, 1971). Apart 
from this, it would be useful to find out, for 
example, by postexperimental questioning: 
to what extent children perceived incon 
sistencies in the learning material. 

It is not clear how the absence of system 


atic differences in reading time between the : 
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illustrated and unillustrated text conditions 
should be interpreted. It could mean that 
contrary to what was suggested above, the 
children who read the illustrated text paid 
relatively little attention to the pictures, 
otherwise their total inspection time would, 
on the average, probably have exceeded 
the reading time in the other condition (ef. 
Dwyer, 1968). It is more likely though, 
that the time spent on the illustrations was 
compensated for by a more rapid reading 
of the text itself. This could be due to 
motivational causes (reading was more fun, 
more curiosity was raised, ete.), but it is 
also possible that the story could be more 
easily understood, and therefore read more 
rapidly, when accompanied by pictures. The 
latter assumption is in accordance with the 
small but consistent difference in retention 
of unillustrated text contents (T), shown 
in Table 1. 
_ The absence of correlations between the 
imagery test scores and the retention test 
scores is not too surprising. It may indicate 
that bringing to mind pictures and text of 
a story read up to a week before, involves 
quite different aspects of imagery ability 
than having to copy from memory geomet- 
rical figures, seen a few seconds before. 
Finally, mention should be made of an 
additional analysis of the data not originally 
intended. As some recent studies (e.g, 
Marks, 1973) suggest, there may be syste- 
matie sex differences involved in picture- 
memory experiments. It seemed therefore 
worthwhile to find out whether differences 
of this kind might have occurred in the pres- 
ent experiment. To this purpose, the results 
of the boys and girls were compared (all 
schools combined) for the experimental and 
for the control condition. No differences 
were found except, in the experimental con- 
dition, on items dealing with incongruously 
presented information (Px/T). On these 


. items, girls tended to choose answers Cor- 


responding to the pictorial content (P al- 
ternatives) more often than boys (t = 1.70, 
df = 33, 05 < p < .10), whereas boys were 
More inclined than girls to select answers 
corresponding to textual information (f = 
200, df = 33, .05 < p < .10). This might be 
taken as support for the suggestion by Er- 
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nest and Paivio (1971) that “in some tasks 
females ‘use’ imaginal processes to facili- 
tate recall whereas males do not [p. 71].” 
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Memory span thresholds were obtained 


for second-grade, sixth-grade, and re- 


tarded children over seven classes of materials in both auditory and visual mo- 
dalities. One of the tasks involved a procedure designed to force “chunking.” 


The major conclusions were as follows: 


(a) a clear developmental effect was ob- 


served which generalized across materials and modalities; (b) the retarded sub- 


jects were consistently inferior to normal children; 


(c) auditory presentation 


produced better recall than visual presentation; (d) relatively high intercorrela- 
tions were observed among the various thresholds for all subject groups, but par- 
ticularly for the older normal children; (e) forced chunking produced longer spans, 
but not differentially across groups; and (f) retarded subjects, as contrasted with 
normal children, may employ different processing strategies. 


A task most commonly used to measure 
short-term retention capacity is the vener- 
able memory span test, particulary memory 
for digits. This task is assumed to be a good 
measure, of the memory component of 
intelligence (e.g., Scott & Scott, 1968) and, 
in fact, digit spans correlate well (.60 to .75) 
with Stanford-Binet and Wechsler IQs. 
Additionally, sequential short-term memory 
processes have figured prominently into 
Various theoretical accounts of retarded 
behavior (e.g. Ellis, 1970; Jensen, 1970; 
Spitz, 1973). 

The process which mediates the ability to 
remember a subspan sequence of items has 
been described by a variety of terms in- 
cluding apprehension span, short-term mem- 
ory, and channel capacity, among others. By 
whatever name one wishes to call it, there 
are some descriptive facts about this process 
which are especially pertinent to theoretical 
“ing in the area of mental retardation: (a) 

Eit span thresholds are positively cor- 
related with mental age: (b) immediate 
memory span depends upon the type of 
material (Cavanagh, 1972); (c) retarded 
individuals typically exhibit a smaller 
JL Thi 
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memory span (digit) capacity than normal 
individuals (Spitz, 1973); (d) modality 
effects are commonly observed in short-term 
memory tasks (Madigan, 1971); and (e) 
temporal or spatial grouping of items in a 
sequence influences memory span (Spitz, 
1973). 

The major objective of this study was to 
investigate concurrently these various effects 
among normal and retarded children. Of 
particular concern were interactions in- 
volving developmental variables, type of 
material, and modality. Also of interest were 
correlations of memory span for different 
types of materials. 


METHOD 


Subjects 

All subjects were obtained from a public ele- 
mentary school serving & predominantly white, 
middle-class population. Three groups of 15 chil- 
dren were randomly selected from a second grade, 
a sixth grade, and a special class for educable 
mental retardates. The mean age of the second 
graders was 7.9 years with a standard deviation 
of .5 years. The corresponding descriptive statis- 
ties for the sixth-grade children were 11.9 and .7 
years, respectively. 

The mean Stanford-Binet IQ for the retarded 
children was 71.7 (SD — 5.1). Their chronological 
age was 11.0 years with a standard deviation of 
13 years. Mean mental age, obtained for each 
retarded subject, was 7.8 years (SD = 1.3 years). 
So, roughly speaking, the retarded children were 
matched with the second-grade children in terms 
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of mean mental age and with the sixth-grade 
children in terms of mean chronological age. 

Sex distribution was approximately equal 
among the three groups. There were 9, 7, and 7 
girls, respectively, among the second-grade, sixth- 
grade, and special-class children. 


Design and Procedures 


Memory span thresholds were obtained for 
seven classes of materials (described below) in 
both visual and auditory modalities. This proce- 
dure resulted in 14 measures for each subject. 

The method for determining thresholds was 
similar to the procedures used with respect to the 
Digit Span Test contained in the Wechsler In- 
telligence Scale for Children (WISC). A sequence 
of given length was presented twice. The number 
of items in a sequence was increased one at a time 
until the child failed two sequences of a given 
length. Memory span was defined as the longest 
sequence the subjeet could recall, without error, 
on two successive occasions. Two or three practice 
trials were given prior to each testing session. 

The examiner started each testing session with 
a two-item sequence. Items were presented at the 
rate of one per second. A successive presentation 
procedure was used to render auditory and visual 
presentations comparable. The auditory materials 
were spoken by the examiner. A slide projector 
was used for the visual presentation. 

The order of testing, with respect to both mate- 
rial and modality, was random for each subject. 
After completing the initial series of 14 tests, the 
entire procedure was repeated for each subject. 
This provided test-retest reliabilities of the 
threshold for each type of material under auditory 
and visual presentation. 


Materials 


i Seven classes of material were used to obtain 
immediate memory span thresholds: digits, let- 
ters, three-letter words, animal pictures, colors, 
mixed (letters and digits), and paired associates 
of letters and digits. Materials were presented 
both visually and orally. 


1, The ee m lists e constructed to avoid 
Common dates (e.g., 1973) and sequences whi 
follow each other in the usual ze ie order. we 

E Letter sequences were arranged randomly 
with the exception that care was taken to avoid 
the spelling of words or combinations of letters 
that follow each other in the alphabet. 

208: The animal lists were composed of the follow- 
ing: tiger, monkey, sheep, horse, turkey, rabbit. 
m hicks and bie. Preliminary to testing, each 
subject was asked to identify the ani i 

and names. 7 eee 

4. The word lists were constructed f - 
letter words chosen from a second-grade r vei 
book. These included bee, car, pot, egg, hat man 
boy, du mu pig cow, pin, rat, dog, toy, sun, 
op, and day. Subjects were asked i iy 
each word before testing. s dps d 
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5. The colors were as follows: red, orang 
black, brown, blue, yellow, green,. and purpl 
The visual presentation was accomplished by pla 
ing colored tapes on clear slides. Again, subjec 
were asked to identify the colors before they 
tested. i 

6. The mixed class of material consisted o 
letters and digits. Common associations wg 
avoided. When the sequence contained an ew 
number of terms, digits and letters were equally 
represented in random order. When the sequen 
included an odd number of items, the lists wen 
counterbalanced such that letters and numbe 
were presented equally often across lists, 

7. Inan effort to include a task that forced som 
organization upon the material, the associatio 
memory span test was devised. The associative 
tasks consisted of letters (stimuli) and digiti 
(responses) shown or read together, for example, 
Z-5, P-3, M-2, and V-9. Common associations 
were avoided. Visual presentation consisted 0 
simultaneous presentation of both terms of 
pair, while in the auditory mode a successive pre 
sentation was made 


REsuLTS 


A variety of procedures have been em 
ployed to score memory spans, ranging from | 
correct or incorrect for the entire sequence 
to a weighted system that gives credit for | 
partially accurate responses. These various 
scoring methods produce highly correlated 
results for children and retardates (Hawkins 
& Baumeister, 1965). In view of this fact 
and because the focus of this study was upon | 
memory thresholds, the dependent measure 
employed here was the longest sequence the | 
subject was able to recall correctly twice in 
succession. 

Table 1 presents mean thresholds for each 
of the three groups by type of material and 
modality. These scores are averaged over the f 
two testing sessions. 

Test-retest reliability coefficients are prè | 
sented in Table 2. Although no statistic 
comparisons were made of the reliability 
coefficients, it is clear that they are generally 
somewhat higher for the sixth-grade children 
than for either of the other two subjet | 
groups. 

Preliminary analysis were conducted t0 
determine whether the sex variable was » 
lated to recall thresholds. No significa” 
difference was obtained for any comparison 
and, for all further analyses of these dat 
the sex variable was ignored. x 

A mixed effects 3 X 7 X 2 X 2 (Group ^ ; 
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TABLE 1 
Mean THRESHOLDS 


Group* 
Material 
Mee | andende | oh nde 

Digit 

Visual 3.6 (.9) | 4.3 (1.0) | 5.9 (1.3) 

Auditory 3.6 (.8) | 4.8 (.8) | 5.9 (1.1) 
Color 

Visual 3.1 (8) | 3.9 (.7) | 5.0 (1.1) 

Auditory | 3.5 (.8) | 4.7. (9) | 5.2 (.9) 
Letter 

Visual 2.8 (.8) | 3.8 (.9) | 5.0 (1.1) 

Auditory | 3.4 (.9) | 4.5. (.9) | 5.4 (1.1) 
Association 

Visual 1.7 (.4) | 2.2 (6)|2.8 (.5) 

Auditory 2.1 (.5) | 2.6 (.5)| 3.0 (.7) 
Word 

Visual 2.6 (.5) | 8.6 (1.1) | 5.0 (1.3) 

Auditory 3.6 (8) | 4.1 (1.0) | 48 C8) 
Mixed 

Visual 3.3 (.8) | 4.0 (1.0) | 5.4 (1.2) 
Auditory 3.6 (.7) | 4.5 (1.0) | 5.5 (1.1) 
Picture 

Visual 3.0 (.7) | 40 (8) | 48 (1.1) 
Auditory 3.4 (7) | 4.3. (5) | 5.3 (1.0) 


* Standard deviations are given in parentheses. 


Material X Modality X  Test-Retest) 
analysis of variance was applied to the 
threshold scores. Significant main effects 
were found for all variablés (groups: F = 
23.5, df — 2/42, p « .001; type of material: 
F = 157.7, df = 6/252, p < .001; modality: 
F = 35.0, df = 1/42, p < .001; test-retest: 
F = 49.0, df = 1/42, p « .001). Under all 
conditions, the sixth graders were superior 
to the second graders, who were, in turn, 
superior to the special-class children. With 
ew exceptions, auditory presentation pro- 
duced better recall than visual presentation. 
Performance was generally better on the 
Second occasion of testing than on the first. 
Digit span thresholds were highest and, of 
Course, the association spans were lowest. 
The mixed class of materials, including 
digits and letters, ranked second for all 
groups. Overall, none of the differences 
between thresholds for colors, letters, words, 
9r pietures was marked. 

owever, some significant interactions 
Were observed, including Group X Type of 
Material (P = 5.0, df = 12/252, p < 001), 
Group X Test-Retest (F = 3.9, df = 2/42, 
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p < .03), Modality X Test-Retest (F = 
10.8, df = 1/42, p < .01), Type of Material 
X Test-Retest (F = 2.5, df = 6/252, p < 
.03), Group X Type of Material X Mo- 
dality (F — 2.5, df — 12/252, p « .004). 
"These interactions were further analysed 
by the Tukey Q procedure. Following are 
some of the clearer findings that emerged 
from these subanalyses: (a) auditory thresh- 
olds were generally higher than visual 
thresholds, more so on the first occasion of 
testing than on the second; (b) the test- 
retest effect was greater for the visual 
modality than for the auditory modality 
but significant in both cases; (c) generalizing 
across materials and modalities, the practice 
effect was greater for the sixth graders, least 
for the retarded children, and significant for 
all groups; and (d) ignoring the association 
spans, the retarded children exhibited 
virtually identical thresholds for all material 
presented in the auditory modality. Given 
visual presentation, however, the retarded 
subjeets displayed fairly large threshold 
differences across the various types of ma- 
terials. The sixth graders, on the other hand, 


TABLE 2 
Test-Retest RELIABILITY COEFFICIENTS FOR 
Each Group ny Type or MATERIAL AND 
MODALITY OF PRESENTATION 


Group 
Span 
sae 2nd grade | 6th grade 

Digit 

Visual 67 44 82 

Auditory 53 51 78 
Color 

Visual 35 18 75 

Auditory 82 .46 33 
Letter 

Visual 83 E 63 

Auditory 79 -68 80 
Association 

Visual 00 -60 65 

Auditory 59 87 15 
Word 

Visual 48 .76 93 

Auditory 53 .68 61 
Mixed 

Visual 57 .38 74 

Auditory 75 .61 71 
Picture 

Visual 64 —.04 7 

Auditory 30 2 63 
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revealed threshold differences for the various 
types of material under both visual and 
auditory modalities. The second-grade sub- 
jects were intermediate in these respects. 

Correlations were computed among thresh- 
olds for the various classes of materials 
within each of the subject groups. Consider- 
ing the sample sizes in each group relative to 
the number of variables measured for each 
subject, it was not feasible to perform a 
factor analysis upon these intercorrelations. 
However, it is possible to look comparatively 
at some of the general patterns that emerged 
from the correlations. The average cor- 
relations of all threshold spans were .50, .63, 
and .76 for the retarded subjects, and the 
second graders, and the sixth graders, re- 
spectively. 

Another way of looking at these results is 
to consider the correlations of within-spans 
modality as compared with between-spans 
modalities, These correlations are presented 
in Table 3. In general, it appears that the 
relationships among memory span thresh- 
olds for various types of material are some- 
what higher when the materials are pre- 
sented within the same modality. 

_ One of the tasks, in particular, explicitly 
incorporates reading  skills—words pre- 
sented visually. It was on this task that the 
retardates and second graders displayed 
their relatively weakest performance (ex- 
cluding the association thresholds). The 
sixth graders, on the other hand, performed 
as well with words as they did with colors, 
letters, and pictures. The correlation be- 
tween visual word spans and span thresholds 
for other types of visual material were 45, 
-65, and .24 for the sixth graders, second 
graders, and the special-class pupils, respec- 


TABLE 3 


INTERCORRELATIONS FOR VISUAL, AUDITORY, AND 
VisuAL-AuDrTORY Tasks 


Visual- 


Group Visual auditory | Auditory T 
Mentally 
retarded 42 46 61 50 
2nd grade .72 57 60 63 
6th grade 78 72 79 76 
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tively. This was a particularly reliable 
for the sixth graders (r = .93). 


Discussion 


These results clearly show a develg 
mental trend in memory span capacity for 
large variety of materials. In every instane 
of the seven classes of materials, the sixt] 
grade children exhibited significantly hi 
thresholds than either of the other group 
This is not an entirely unexpected finding 
in that growth functions for memory 
have been observed previously for bo 
normal and retarded subjects (c.g., Korst d 
Irwin, 1968). The obvious inferiority of thg 
retarded subjects is of considerable inte 
particularly when the comparison is 
against the normal subjects of matched 
mental age. Again, for every type of mi 
terial, the special-class students had lower 
thresholds. The finding that this deficiency 
applied for all types of materials in both the 
auditory and visual modality suggests that 
the relationship between memory span a 
IQ is a fairly general phenomenon. 

The source of these developmental trend: 
however, is not readily determined from th 
present study which was designed primarily ] 
for parametric purposes. One important issue 
is whether the developmental differences. 
observed here represent differential encoding | 
or retrieval capacities. There is some evi- 
dence to suggest that encoding factors may | 
be the source of the differential performante | 
among the three subject groups. The older 
and/or brighter subjects may have el | 
ployed more sophisticated grouping o 
on strategies (MacMillan, 1970; Spitz | 
1973). 

The associative spans were considered t0 
be a special case of the mixed-class m& 
terials. One way of looking at the associative 
task is that it requires the subject to 
“chunk” or organize the individual items 
into groups of two. So, by doubling tht 
number of associations that defined ea 
Subjects’ threshold, a measure of total items 
remembered in series was obtained. d 
comparison of the mixed-class spans 8^ 

e associative spans enables us to determin? 
the effect that forced chunking has uP” 
memory span. 


| 
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Because of the low reliability of the visual 
association task for the special-education 
children, only the auditory tasks were com- 
pared. The advantage of the associative 
over the mixed spans was .7, .6, and .5 items 
for the special-class, second-grade, and 
sixth-grade children, respectively. While the 
overall difference between types of spans 
was significant, none of the differences 
between subject groups was significant, 
indicating that all groups benefited about 
equally from this type of associative chunk- 
ing. 

Another general finding was that auditory 
presentation produced higher thresholds 
than visual presentation of the material. 
Again, the study was not designed to 


elucidate the processes that might be 


implicated with respect to modality effects; 
but, in consideration of the oral recall re- 
quirement, we may presume that less trans- 
formation of the material was required 
under the auditory presentation. 

Modality effects favoring auditory input 
have been observed previously in short- 
term memory experiments. A number of 
hypotheses have been advanced to account 
for this phenomenon. For instance, it has 
been suggested that auditory stimuli are 
encoded more rapidly into short-term 
memory (Laughery & Pinkus, 1966). Others 
have argued for separate iconic and echoic 
Memory stores with input remaining active 
longer in the latter (e.g., Craik, 1969). In 
any case, this is a difference that seems to 
occur across various types of inputs, with 
the possible exception of digits. The magni- 
tude of the modality effect seems to be 
related to developmental status. The sig- 
nificant interaction involving Groups X 
Modality x Materials was, in part, at- 
tributable to a much more comparable 
Performance of the sixth-grade children for 
auditory and visual presentation. Dif- 
ferences between the second graders and 
the retardates were most pronounced in the 
auditory modality for relatively simple 
stimuli (digits, colors, letters, mixed). 
jt should be noted that while span thresh- 
a Were obtained for seven different classes 

materials, the distinction seems to be 
more obvious and perhaps more meaningful 


893 


in relation to the visual modality. Under the 
auditory condition, names of items con- 
tained in the sequences of colors, pictures, 
and words had to be spoken to the subject. 
Presumably under this condition, the ma- 
terials should be processed in identical codes, 
although it might be argued that categorical 
constraints were differentially involved. 

Actually, it was the retarded subjects who 
displayed the most even performance across 
the various types of material presented in 
the auditory modality (see Table 1). Ex- 
cluding the association spans, the retardates 
showed no significant threshold difference 
for materials under auditory presentation. 
There was more variability evident across 
thresholds in the performance of the second 
graders and particularly the sixth graders 
under auditory presentation. This finding 
suggests that the retardates may have relied 
much more heavily on an acoustic storage 
system and/or direct read out from that 
system. 

Although it was not feasible to perform a 
factor analysis on the correlational data, an 
inspection of the correlation matrices sug- 
gests that, in general, there is a strong 
relationship among the various span meas- 
ures. The average correlation was highest 
for the sixth-grade children and lowest for 
the special-class children, but substantial 
in all cases. These findings are consistent 
with conclusions from earlier studies re- 
garding the operation of a general memory 
span factor (Brener, 1940). Looking at 
Table 3, there is also an indication that the 
correlations are somewhat higher within 
modalities than between modalities. 

The one task that explicitly seems to 
involve reading processes is the visual word 
span. Marked differences were found be- 
tween normal and retarded subjects with 
respect to the correlation between visual 
word span and spans for other types of 
visual materials. For both groups of normal 
children, visual word spans were highly 
correlated with performance on the other 
visual tasks; but this was not the case for 
the retarded subjects. It should be em- 
phasized that all subjects, including re- 
tardates, were able to identify the printed 
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words. It may tentatively be suggested that 
the processes involved in reading enter more 
heavily into sequential short-term memory 
of normal children than of retarded children. 
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Three experiments were performed to investigate situations in which 
lowering the reading grade level of textual material or providing tape- 
recorded auditory supplementation to the reading material would 
provide maximum comprehensibility gain. For typical Air Force tech- 
nical training materials (study guides) and for an on-the-job look up 
manual, no gain was evidenced as the result of lowering the reading 
grade level or of auditory supplementation. However, for home study 
course materials, lowering the reading grade level resulted in increased 


criterion test scores. 


Both during training and on the job, a 
high dependence is typically placed on 
written language for communicating infor- 
mation to technical personnel. Accordingly, 
materials that are written at a level which 
surpasses the literacy skill of technical per- 
sonnel can be expected to mitigate on-the- 
job performance. Caylor, Sticht, Fox, & 
Ford (1972), for example, found that the 
reading grade level of military publications 
in the Army occupations they investigated 
exceeded the average reading ability of the 
men by from one to six years. Madden and 
Tupes (1966) documented the number of 
low-reading-ability airmen in the Air Force 
and Mowry, Webb, & Garvin (1955) showed 
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that for Navy trainees, much of the re- 
quired written technical materials was too 
difficult. Sticht and Kern (1971) found that, 
especially for low-aptitude personnel, the 
greater the gap between the literacy skills 
of the men and the literacy requirements of 
printed technical material, the less their 
tendency to use that material. 

Since technical personnel employ many 
types of textual materials under varying 
conditions, clarification of the circumstances 
under which lowering the reading grade 
level of textual materials yields maximum 
benefit would seem important. This issue 
was the first concern of the present work. 
The second concern was the investigation 
of the effects of auditory supplementation 
(listening to tapes while reading) on com- 
prehensibility. It seems that multimodal 
presentation could possess benefits because 
it would (a) serve to refocus the reader’s 
attention in the case of distraction, (b) help 
the person who can comprehend the spoken 
language but who possesses a reading dis- 
ability, and (c) serve as a redundant and 
noninterfering information source. There 
is conflicting evidence regarding the multi- 
modal presentation issue. Long (1970), 


895 


896 


Virag (1971), and Van Mondírans and 
Travers (1962) have presented negative evi- 
dence while Severin (1967) and Singer 
(1970) have reported positive results. It is 
possible that the different types of materials 
and the varying conditions of use in these 
studies contributed to the discrepant find- 
ings. 

Specifically, the hypothesis underlying 
this set of studies was that materials modifi- 
cation (either through lowering the reading 
grade level of textual materials or through 
multimodal presentation) would produce 
differential comprehensibility effects de- 
pending on the type of materials, the pur- 
pose of the materials, and the conditions of 
use. To this end, three types of textual mate- 
rials (in-course study materials, home study 
materials, and a reference manual used on 
the job) were investigated within the Air 
Force materiel facilities occupational spe- 
cialty. In all, three experiments were per- 
formed—one relative to each type of learn- 
ing material and comprehensibility context. 


Materials Development 


AII of the experiments employed written 
materials used in the Air Force materiel 
facilities (supply) Specialty. The tasks per- 
formed by these specialists involve base 
supply ordering, inventory control, and dis- 
tribution. The stimulus materials used in 
Experiment 1 consisted of a course study 
guide used during two intermediate weeks 
(three and four) of a formal, intensive, five- 
week materiel facilities specialist training 
course. The stimulus materials used in Ex- 
periment 2 consisted of career development 
course guides used as a home study manual 
by airmen who do not attend the formal 
training course but are Seeking promotion 
within this supply specialty. The sections of 


ke home study manual contained, basically, 
e 


received a simplified version of the appro- 


riate To develop the 
simplified materials, guidelines ware fol- 
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lowed which have been suggested for ti 
purpose in the readability literature (Fleg 
1951; Klare, 1963). These were as folloy 
(a) using shorter words, (b) using short 
sentences, (c) avoiding “complicated” sel 
tence constructions, (d) “personalizing” th 
material by the use of personal words an 
sentences, (e) using the active instead of th 
passive voice, and (f) presenting info 
tion in a stepwise manner. 

For example, an unmodified original pas 
sage read as follows: 


The Air Force uses two basic methods—prepos 
and postpost—for issuing supplies. A prepost issue 
is processed through the computer before the prop 
erty is actually selected from storage. A postpost 
issue is made before the issue request is processed 
through or into the computer. A simple way to de 
termine which method is being used is that pre 
post documents are machine printed and postpost 
documents are normally handscribed. 


The same passage simplified read as follows 


The Air Force uses two methods for issuing supe 
plies. These are prepost and postpost. A preposti 


issue is processed by the computer before the sup- 
plies are taken from storage. A postpost issue 18 
processed by the computer after the supplies are 


taken from storage. You can easily tell which 
method is being used because prepost order doeu- 
ments are printed by machine. Postpost order 
documents are usually written bv hand. 


The simplified written revisions of the 


course study guide and home study manual | 


materials and the unmodified originals were 
recorded on magnetic tape in cassette form. 


These tapes followed the written text ver- | 


batim and were identified for simultaneous 
use by subjects in certain treatment groups 


as they were reading the textual materials. — 


Materials Verification 


The intent of the materials modification 
was to lower the reading grade level of the 
textual materials so that it would be within 
the reading grade level of the person using 
the materials for learning purposes. Such à 
modification would allow an evaluation of 
the effects of the modification on the ability 
of the different materials to transfer In- 
formation. The automated readability index 
(ARI; Smith & Senter, 1967) and FOR- 
CAST (Caylor et al., 1972) formulas were 
first used for this purpose. These formulas 
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were selected rather than the more typically 
employed Dale-Chall (1948) and Flesch 
(1951) formulas since they were validated 
on military samples using technical mate- 
rials. The ARI technique for estimating 
readability is based on word difficulty and 
sentence length. The FORCAST method is 
based on the number of one-syllable words. 
Both techniques yield a score which can be 
directly converted to a reading grade level 
for the textual material under consideration. 
Approximately 50% of the stimulus reading, 
material, randomly selected from the modi- 
fied and original course study guide, home 
study manual, and reference manual mate- 
rials was analyzed using the two procedures. 
Table 1 shows the results of the analyses for 
the different stimulus materials. 

The results indicated that the simplifica- 
tion procedures achieved the desired results, 
with the difference between the original and 
simplified versions showing up to a some- 
what greater degree in the ARI-determined 
reading grade levels. The mean reading 
grade level of airmen in the Air Force supply 
fiel d is 10.2. The ARI results suggested the 
original materials to be above this level and 
the modified materials to be considerably 
below this level. The FORCAST results in- 
dicated the modified materials to more 
closely match, but not to meet, the reading 
grade level of supply field personnel. 

_ Convergent verification of the simplifica- 
tion procedure was also accomplished by 
using the cloze technique (Taylor, 1953). 
Eight randomly selected 150-word sample 
Paragraphs from the original and the sim- 
plified course study guide and from the 
home study manual materials, matched for 
content, were administered to 39 airmen 
trainees in the first two weeks of the 
materiel facilities specialist training course. 
Each subject received both simplified and 
original materials (of different content) ina 
random order. Thus, each subject received 
eight course study guide and two home study 
manual paragraphs. The actual order of 
Presentation of the passages to each subject 
Was randomly determined. Following & 
Standard cloze test procedure, every fifth 
Word from a randomly determined starting 
Point in each paragraph was deleted. Mean 
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TABLE 1 


Mean READING GRADE LEVEL AS MEASURED BY 
ARI ann FORCAST METHODS For ORIGINAL 
AND SIMPLIFIED MATERIALS 


ARI FORCAST 


Material type 
Original | Simplified] Original | Simplified 


Study guides | 11.53 | 6.80 | 11.75 | 10.02 
HSM 12.74 | 7.40 | 11.95 | 11.20 
AFM 07-1 14.57 | 8.32 | 12.53 | 11.09 


Note. Abbreviations: ARI — automated reada- 
bility index, HSM = home study manuals and 
AFM 67-1 = Air Force Manual (67-1). 


cloze scores (number of deleted words cor- 
rectly identified) for each trainee on the 
simplified and original versions were sepa- 
rately compared for the course study guide 
and the home study manual passages. In 
both the course study guide and home study 
manual passages, the simplified materials 
were found to be easier to read (t = 4.00, 
df = 38, p < .001 and t = 2.72, df = 38, p < 
01, respectively). 


Criterion Development 


In order to test the comprehensibility of 
the original and the simplified textual mate- 
rials, a set of criterion tests was developed 
using the multiple-choice format. To de- 
velop the course study guide and home study 
manual associated tests, an item pool was 
first assembled. Sixty multiple-choice items 
were written to cover the content of the 
course study guide for Weeks 3 and 4 (30 
items per week), and 40 items were written 
covering the selected chapters of the home 
study manual. The number of test items 
developed was proportional to the amount 
of material covered in each section of the 
stimulus materials. The course study guide 
and home study manual test items were 
combined into a 100-item preliminary test 
which was administered to 63 materiel 
facilities specialist trainees who had com- 
pleted Week 4 of the formal training course. 
Two preliminary test forms were con- 
structed with the items presented in a 
reverse order on Form B. Approximately 
half of the subjects received each form. 
Ttem-test correlations and difficulty indices 
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were calculated for each item (Fan, 1952a, 
1952b). An item-test correlation of r > .21 
(p S .10) was accepted as a cutoff point for 
item inclusion in the final criterion test 
forms. Fifteen additional test items for each 
block were randomly selected from each of 
the criterion tests forms currently employed 
by course personnel. These items had been 
previously developed item analytically. The 
final criterion tests included 30 test items 
for each course week and 30 for the home 
study manual. 

The multiple-choice format was also em- 
ployed for the reference manual criterion 
test. The items for this test were not item 
analytically developed. They were selected, 
however, to represent a systematic sample 
of the paragraphs in the included sections 
for the reference manual. 


Reading Grade Level Determination 


Madden and Tupes (1966) provided 
equations for predicting the reading grade 
levels of Air Force enlistees from aptitude 
test scores obtained from the Airman 
Qualifying Examination (AQE). It is pos- 
sible that the sample to which the formula 
would be applied in the present study would 
possess different characteristics from that on 
which the formulas were originally devel- 
oped. To measure this possibility, a sample 
of 200 materiel facilities specialist trainees 
enrolled in the formal training course dur- 
ing the fall of 1971 was randomly selected. 
Application of the Madden and Tupes for- 
mula for administrative career fields to this 
new sample found the mean reading grade 
level to be 10.2—the identical value reported 
by Madden and Tupes for 1961-1962 data. 
The distributions of scores also appeared 
to be similar, and it was concluded that the 
equation was stable and could serve as a 


predictor of reading grade level for the sub- 
jects in this study. 


Experiment 1 


This experiment was designed to investi- 
gate the gain, if any, from lowering the 
reading grade level of study materials used 
as a classroom learning supplement and the 
learning advantages from supplementing 
the materials with a tape-recorded presenta- 
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tion which was listened to while the ms 
rials were read. 


METHOD 


Subjects 


Sixty-six materiel facilities specialist, trainee 
the Logistics School at Lowry Air Force Base w 
selected to serve as subjects. All subjects wi 
about to begin Week 3 of their five-week traini 
course. It was necessary for each subject to coi 
plete successfully Week 3 prior to advancing 
Week 4. Five subjects who failed the Week 3 en 
terion tests were forced to repeat Week 3. It wi 
possible to place four of these subjects in d sS 
receiving the same treatment, and they continut 
in the experiment. Each passed the Week 3 € 
terion test after the repetition and were advance 
to Week 4. The result was a total of 65 subjects, 


Experimental Design 


Four treatment groups were formed with 
of subjects randomly assigned to a treatment 
A control group used as stimulus material 
original, unsimplified written study guides; 
experimental group used the original written sti 
guides but, in addition, had the tape recordings 0 
the original lessons. Subjects in this treatment 
group were instructed to listen to these tapes while 
simultaneously reading the required materials, ' h 
third group employed the simplified study guides, 
and the fourth group used both the simplified 
written materials as well as tape recordings of the 
simplified lessons.' For analytic purposes, all sub- 
jects were postexperimentally divided into high- 
and low-reading-ability groups (split at a reading 
grade level of 10) by using the Madden and Tupes 
reading grade level determination formula for ad- 
ministrative career fields in the Air Force. 


Procedure 


All subjects, after completing Week 2 of the 
training course, were read instructions indicating | 
the nature of the study in which they were about 
to participate. They were told that they had been 
chosen to take nart in a program designed to com- 
pare various methods of presenting instruction 
materials and that they, personally, could benefit 
in the future from the results, For the treatment 
groups using auditory supplementation, a demon- 
stration of the use of a Honeywell playback device 
was given. These subjects were additionally in- 
structed to listen to the appropriate tape and to 
read simultaneously their study guides at le 
once, although they could read and listen to them 
as often as they wished. The use of the playba 
device was restricted to weekday nonclass hours: 
Monitoring by course personnel ensured the prop 
use of the auditory materials. wo 

ch treatment group was composed of b 
classes. Members of each class progressed toget 
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from Week 3 to Week 4, except for the failures 
who repeated Week 3 before progressing to Week 
4, Written questionnaires (the questions were also 
read to the subjects) were administered on the fi- 
nal day of each block prior to the criterion test 
session, The questionnaires included two items 
which inquired into the difficulty and likability, re- 
spectively, of the written course study guide using 
five-point Likert-type items. Data for only the 
questionnaire administered at the end of Week 4 
were analyzed to allow for maximum subject fa- 
miliarity with the stimulus materials. 


RESULTS 


The criterion test scores for Weeks 3 and 
4 were summed for each subject independ- 
ently for both the criterion test items devel- 
oped specifically for this study using item- 
analytic procedures as well as for those 
randomly selected from the test questions 
normally used in the course. No statistically 
significant differences were found between 
the two sets of eriterion scores (those devel- 
oped here and those previously developed) 
in a repeated measures analysis of variance 
(F = 1.62, df = 1/64, p > .05), and the two 
sets of scores were pooled in the subsequent 
analysis. An unweighted means analysis of 
variance was performed on the criterion test 
data to determine the effects of experimental 
materials and reading grade level. A sig- 
nificant main effect for reading grade level 
was observed with the better readers scoring 
higher on the criterion tests (F = 10.92, 
df = 1/47, p < .01). No statistically signifi- 
cant experimental materials main effect 
(F < 1, df = 3/47, p > .05) or interaction 
effect (F = 1.05, df = 3/47, p > .05) was 
observed. 

Two 2 x 2 unweighted means analyses 
were performed on the questionnaire items. 
The analyses compared the difficulty and 
likability, respectively, of the original and 
modified course study guide materials used 
in Weeks 3 and 4 with those used in Weeks 
1 and 2 for the high- and low-reading-grade- 
level subjects. No statistically significant 
differences were observed for reading grade 
level in either analysis (F < 1, df = 1/61, 
p > .05 for difficulty; F = 3.15, df = 1/61, 
p > .05, for likability) or for experimental 
materials difficulty (F < 1, df = 1/61, 
p > .05). A statistically significant effect for 
experimental material likability was ob- 
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served with the modified materials judged 
more likable (F = 6.16, df = 1/61, p < .05). 


Experiment 2 


à Experiment 2 paralleled Experiment 1 
in purpose but employed the home study 
manual materials. 


METHOD 


Subjects 


Forty-nine Air Force trainees at Lowry Air 
Force Base awaiting assignment to schools for 
specialty training served as subjects. 


Experimental Design 


Three treatment groups were formed. The con- 
trol group used the original, unmodified home study 
manual. One experimental group used the simpli- 
fied home study manual and a second experimental 
group used the simplified home study manual with 
auditory supplementation. For analytic purposes, 
all subjects were postexperimentally divided into 
high- and low-readability groups (split at a reading 
grade level of 10). 


Procedure 


On arrival at the test center, subjects were ran- 
domly assigned to one of the three experimental 
treatments. Each treatment was administered in à 
separate room. The instructions given to the sub- 
jects were similar to those used in Experiment 1, 
noting that the purpose of the study was to com- 
pare various reading materials. The subjects were 
also told that after studying their materials for 
two hours (an amount of time judged by the course 
personnel to be more than adequate for even a slow 
reader), they would take a “very hard test" to 
measure how much they had learned. As before, 
the subjects in the auditory supplementation con- 
dition were told to listen and read simultaneously. 
Subjects were allowed to take breaks whenever 
they wished. However, they were repeatedly ad- 
vised of the difficulty of the forthcoming test and 
told to study and learn as much as they could. 
After the study period, the home study manual 
materials were collected and the tapes were turned 
off, The 30-item multiple-choice test, developed 
as previously described, was then administered. 


RESULTS 


An unweighted means analysis of vari- 
ance was performed on the criterion test 
data to determine the effects of experimental 
materials and reading grade level. A statis- 
tically significant main effect for experi- 
mental materials (F = 3.31, df = 2/43, 
p < 05) was the only statistically signifi- 
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cant effect observed. Application of a Neu- 
man-Keuls test indicated that the treat- 
ment group which received the simplified 
home study manual materials scored sig- 
nificantly higher than the group which re- 
ceived the original materials (p < .05). The 
group working with the simplified materials 
and with auditory supplementation scored 
intermediate between the other two and 
was not different from either of them at 
a statistically significant level. The high- 
reading-ability subjects received somewhat 
higher criterion scores than the low-reading- 
ability subjects. However, neither this dif- 
ference (F < 1, df = 1/43, p > .05) nor the 
interaction effect (F < 1, df = 2/43, p > 
-05) was statistically significant. 


Experiment 8 


Experiment 3 paralleled the prior two ex- 
periments in purpose but employed the 
reference manual as the reading material. 


Metxop 
Subjects 


Thirty-five airmen awaiting assignment to 
schools for Specialty training at Lowry Air Force 
Base served as subjects, All subjects had also par- 
ticipated in Experiment 2, 


Experimental Design 


Two experimental groups were formed. One 
group used the original, unmodified reference 
manual while the second group used the simplified 
version. As in Experiments 1 and 2, for analytic 
Purposes, all subjects were postexperimentally di- 
vided into high- and low-reading-ability groups 
(split at a reading grade level of 10) 


Procedure 


ey were cautioned 
any one question, 
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since less than one half of a minute was allotted 
for each question. Subjects were also told to gues 
on all questions about which they were unsure and 
to answer all questions. This procedure paralle 
the method of use of the reference manual in op 
erational contexts. 


RESULTS 


A 2 X 2 unweighted means analysis o 
variance was performed on the criterio 
test data. The only statistically significan| 
main or interaction effect observed was or 
reading grade level (F = 5.51, df = 1 33, 
p < .05). 


Discussion 


The results of the three experiments here 
described provide insight into two major 
issues: (a) delineation of when a mismatch 
between the literacy skills of technical per- 
sonnel and literacy requirements can be 9 
pected to result in less than effective per- 
formance and (b) evaluation of methods fot 
minimizing such deleterious effects. 

The results of Experiment 1 indicated a 
statistically significant main effect for read- 
ing grade level. This finding was not unex- 
pected because of the known high correla- 
tion (Madden & Tupes, 1966) between 
Airmen Qualifying Examination scores 
(used here to prediet reading grade level) 
and reading ability. Little if any gain, < 
however, was realized from simplification of | 
the textual materials or from the auditory | 
supplementation. This seemingly surprising 
result might have been anticipated. First, 
the study guides, in the present context, | 
were essentially a lecture and classroom | 
Supplementation. In the formal training | 
situation involved here, repeated presenta- 
tions of the course content took place 
through lecture, workbooks, programmed 
instruction handouts, remedial sessions, and 
the study guides themselves. Accordingly, if 
the student did not learn the subject matter 
Content through one medium, he could 
master it through another. With multiple 
learning and review presentations, neither 
the reading grade level of the written mate- 
rial nor the auditory supplementation was 
Overriding. A statistically significant differ- 
ence in likability, however, was observ 
with the subjects showing a preference fot : 
the simplified materials. 


f 
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Experiment 2, which involved learning 
through the home study manual materials 
only, did.not allow for the "information 
redundancy" of Experiment 1. Although the 
high-reading-ability subjects received some- 
what higher criterion scores than the low- 
reading-ability subjects, the difference was 
not statistically significant. It is possible 
hat the short study time and difficult learn- 
ing task did not permit high reading ability 
to manifest itself to advantage. More im- 
otantly, a statistically significant differ- 
ence among the treatment groups was found 
in Experiment 2. The group which received 
the simplified home study manual materials 
scored higher than the group which received 
he original materials. It is clear, then, that 
- when the luxury of “information redun- 
dancy," which might be present in a formal 
training course, is absent, rewriting of the 
course materials at a lower reading grade 
evel can be expected to lead to more effec- 
tive performance. With reference to the 
criterion inyolved, however, there is only 
little in the present data to support the con- 
tention favoring the use of simultaneous 
auditory supplementation. 

The present findings differ from those of 
Sellman (1970), who also used home study 
manual materials to investigate the effects 
of lowered reading grade level and audio 
supplementation on criterion test perform- 
ance as a function of mental ability cate- 
gory. His results showed a simplified, mate- 
Nals-plus-tapes group to score higher than 
groups using only simplified materials or 
conventional materials. No differences were 
found between the latter two groups. 
Mental ability was also found by him to be 
statistically significant, with subjects in the 
higher mental ability categories performing 
better, There are, however, several differ- 
ences between Sellman’s procedure and that 
employed here. First, no time limit was im- 
posed on his subjects, with the consequence 
that the simplified-plus-tapes group spent 
a statistically significant greater amount of 
time (as reported by Sellman) with the 
materials than either of his other two 
groups. It would thus appear that time for 
study was confounded with the experi- 
mental treatments. Second, his “tapes” con- 
dition involved “reiterations of topics al- 
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ready discussed.” This procedure results in 
enforced practice by the "tapes" group 
rather than the simple, simultaneous audio 
presentation as employed in this study. 
Finally, Sellman administered the same pre- 
and posttest to his subjects. His results on 
mental ability categories thus might be 
somewhat artifactual since he may have 
been measuring “ability to remember the 
questions.” 

Experiment 3 involved a technical man- 
ual used for reference purposes on the job. 
As in Experiment 1, the better readers scored 
higher on the criterion test. No significant 
treatment effects were found in Experiment 
3 and, accordingly, it seems that with a 
“look-up” type of task, reading grade level 
simplification, as here employed, exerts 
little practical effect. Possibly, in a look-up- 
type multiple-choice test, subjects merely 
have to recognize answers and not, neces- 
sarily, comprehend the materials. It is also 
possible that the material was too difficult 
and/or the time period too short for a treat- 
ment effect to appear. Nevertheless, simpli- 
fication of technical materials of this sort 
might be expected to have only marginal 
effects on performance. Higher reading 
grade level, on the other hand, can be ex- 
pected to read faster and, hence, complete 
more items correctly. 

The fact that the ARI and FORCAST 
formulas showed high agreement in their 
reading-grade-level determination for the 
original materials but not for the simplified 
materials (although both formulas agreed 
in direction of the effect), is probably in- 
dicative of their differential sensitivity to 
the simplifying procedure utilized. The re- 
writing procedure clearly favored the short 
sentence — small word approach of the ARI. 
It has been noted, however, that readability 
formulas may not apply when the material 
being measured has been intentionally 
simplified (Klare, 1963). Independent veri- 
fication of the simplifying procedure em- 
ployed here by using the cloze procedure 
would thus seem to be critical in any future 
efforts using the paradigm of this study. 

The issue of using grade levels both as 
measures of reading-level achievement as 
well as of prose difficulty is one that should 
be considered. Clearly, although a common 


902 


metric is being used, the constructs are not 
congruent. Moreover, the reading grade level 
of the trainees was estimated here by using 
an aptitude test (Airmen Qualifying Exami- 
nation) score. Accordingly, the equation of 
passage reading difficulty and subject read- 
ing ability by virtue of equal reading grade 
level assignment is something less than a 
straightforward analogy. In the absence of a 
common metric able to perform this task, 
however, the procedure followed here would 
seem adequate. 

Finally, the traditional criterion test de- 
velopment technique followed here favors 
the inclusion of items that maximize in- 
dividual differences. Thus, the failure to find 
treatment effects on Experiments 1 and 3 
could have possibly been the result’ of the 
insensitivity of the criterion test to such an 
effect. On the other hand, if one adopts this 
view, the statistically significant treatment 
effect observed in Experiment 2 is probably 
a very stable finding. 

In summary, it would seem that the 
maximum gain will accrue by writing 
textual materials which are used in situa- 
tions in which there is little opportunity for 
redundant information, at a reading grade 
level which is concordant with the ability of 
the reader. This situation is typified by the 
home study manual experiment described 
here (Experiment 2). Lesser gain, if any, is 
indicated by emphasis on modification of 
materials employed as adjuncts to other ma- 
terials or for on-the-job reference purposes. 
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INFORMATION SEEKING FOLLOWING THE CONFIRMATION 


OR CONTRADICTION OF BELIEFS' 


CHARLES B. SCHULTZ? 
Trinity College 


The effect. of contradiction and confirmation on information seeking and acquisi- 
tion was investigated. Seventy-five undergraduates selected and examined the 
more interesting slide in a pair of slides. The critical slide pairs were comprised of 
one member containing belief-congruent information and another containing be- 
lief-diserepant information. Before the slides were viewed, evidence was pre- 
sented which either contradicted existing beliefs (contradiction), supported exist - 
ing beliefs (confirmation), or contradicted and supported existing beliefs (doubt). 
Contradiction subjects examined the slides longer and acquired more information 
than confirmation subjects. Ratings of interest, selection of slides, examination 
of slides, and acquisition of information were in a belief-congruent direction for 


the confirmation condition and a belief-discrepant direction for the contradiction 


condition. 


Learners are often confronted with argu- 
ments or evidence which conflict with beliefs 
they hold. Such discrepant information is 
thought to have at least two effects on 
further information seeking and acquisition: 
(a) According to one theory of curiosity 
(Berlyne, 1960, 1965), support for a dis- 
crepant position sharpens conceptual con- 
flict by strengthening a subordinate belief to 
a point where it competes with the dominant 
belief. As a result of the heightened conflict 
and the curiosity which accompanies it, 
information seeking is intensified and, even- 
tually, reinforced when the search is brought 
to a satisfactory conclusion. (b) According to 
dissonance theory (Festinger, 1957, 1964), 
information regarding a discrepant position 
in particular is not sought and, in fact, is 
avoided because it is more likely to arouse 
or increase dissonance than to reduce it. 

In regard to the first of these theories, 
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Berlyne (1962) manipulated conceptual con- 
flict by varying the number and the equi- 
probability of response alternatives (i.e., 
uncertainty). Both factors contributed to 
curiosity. However, curiosity was measured 
in this study and in others (e.g., Heslin, 
Blake, & Rotton, 1972) by self-reports of 
intentions to seek additional information 
rather than by search behavior. A direct 
relationship between uncertainty and in- 
formation search has been demonstrated on 
a variety of tasks ranging from complex 
verbal problems (Lanzetta, 1963) to concept 
learning (Long, 1965), word games, and 
picture identification (Lanzetta & Driscoll, 
1968). Moreover, this relationship appears 
to be mediated by subjective uncertainty 
(i.e., the subject’s estimate of uncertainty) 
which on the one hand, increases mono- 
tonieally with uncertainty due to stimulus 
and response factors (Driscoll & Lanzetta, 
1965), and, on the other, is directly related 
to information search (Driscoll, Tognoli, & 
Lanzetta, 1966). 

These particular findings on the effect of 
uncertainty on search behavior are con- 
sistent with Berlyne's (1960, 1965) conflict 
theory of curiosity. Berlyne's (1965) theory 
also includes the assumption that informa- 
tion which reduces relatively large amounts 
of curiosity will be strongly reinforced. 
Thus, not only is information-seeking be- 
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havior reinforced by the reduction of con- 
ceptual conflict and, therefore, more likely 
to occur on similar occasions, but the 
knowledge derived from the search is more 
likely to be retained than knowledge which 
is not required to reduce conceptual conflict. 
In this regard, Berlyne (1954) found that 
subjects recalled more information from 
statements when they were asked questions 
beforehand and when they regarded the 
Statements as familiar. Berlyne inferred 
conceptual conflict from ratings of familiarity 
and from the effect of prequestions. A 
primary purpose of the present investigation 
was to examine the effect of directly in- 
ducing conceptual conflict, by. strengthening 
a subordinate response on both information 
seeking and the acquisition of knowledge. 
Often information which is necessary for 
the reduction of conceptual conflict is 
discrepant with an individual’s existing be- 
liefs. However, in most cases, information is 
selectively sought in order to avoid ex- 
posure to a contradictory position (Festinger, 
1957, 1964). At least one exception to this 
general rule occurs when the contradictory 
or discrepant information is thought to be 
of practical use (Canon, 1964; Freedman, 
1965). Discrepant information also may be 
sought for its intrinsic utility, that is, for its 
usefulness in resolving the conceptual con- 
flict itself and the ensuing curiosity (Rhine, 
1967). Indeed, Festinger (1964) notes that 
“avoidance [of discrepant information] 
would be observed only under circumstances 
where other reasons for exposure, such as 
usefulness or curiosity, were absent [p. 96]." 
In the present Study, two forms of con- 
ceptual conflict were examined to determine 
under which conditions, if any, selective 
exposure is modified. In one condition, only 
discrepant evidence was presented to the 
subject (contradiction) before examining 
additional materials which contained both 
congruent and discrepant information; in 
the other condition, both congruent and 
discrepant evidence were presented (doubt). 
Since contradiction involves strengthening 
only the subordinate belief while doubt 
involves strengthening both subordinate and 
dominant beliefs, contradiction may direct 
the individual more explicitly toward the 
discrepant position as a potential source of 
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information for the resolution of conceptual 
conflict than doubt. Indeed, discrepant jn: 
formation may be indispensable for th 
resolution of conflict in the contradiction 
condition, and, therefore, be of considerab] 
intrinsic utility. Thus, “playing the devil's 
advocate" may be instructionally productive 
since it prods learners to seek information 
which disagrees with their existing beliefs 
and which could increase dissonance. 

Two hypotheses were derived from the 
above rationale: (a) When conceptual con- 
flict is aroused by discrepant evidence pre- 
sented by itself or in combination with sup- | 
porting evidence, an individual seeks and 
acquires more information on the topic in 
general than he otherwise would. (b) Par 
ticularly under conditions of contradic k 
the individual seeks and acquires more 
information on the discrepant position itself 
than when his existing beliefs are confirmed, 


METHOD 
Subjects 


The subjects were 75 volunteers from an under- 
graduate course in educational psychology at the — 
Pennsylvania State University who earned stand- 


ard score points toward their grade for participat- — 


ing in the experiment. There were 15 subjects : 
randomly assigned to each treatment. An addi- | 
tional 15 were assigned to the doubt group because 
data from that cell were analyzed for other ex- 
perimental purposes reported elsewhere (Schultz, 
1974). 


Stimulus Materials 


The principal experimental materials were à 
modified version of the Festinger and Carlsmith 
(1959) study of the cognitive effects of forced com- 
pliance. At least two advantages were obtained 
by using the Festinger-Carlsmith study as expert: 
mental materials: Common sense explanations 0) 
attitude change can be made according to two 
conflicting theories (dissonance vs. reinforcement) 
and the reinforcement position is almost unani- 
mously invoked as an explanation of Au) 
change by relatively naive observers. All 75 sub- 
jects included in the analyses selected the rein- 
forcement position. Five other subjects were ex- 
cluded because they chose the dissonance position. 
Thus, information about reinforcement theory Wa 
congruent with the subject’s belief, while inform? 
tion about dissonance theory was discrepant. ith 

The description of the Festinger-Carlsm 
Study was broken down into small units of & her 
graph or several lines which were presente » 
2 X 2 inch slides. Two slides were rear project 
simultaneously from two carousel projectors on 
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a translucent screen. In all, there were 30 slide 
pairs whieh were organized into three segments. 
The 16 pairs of slides in the second segment were 
particularly important because they contained 
definitions, assumptions, rationales, and predic- 
tions based on the congruent and discrepant posi- 
tions. Each pair consisted of one congruent and 
one discrepant slide. The congruent slide was 
balanced in length and grammatical form to match 
the discrepant member on the same topic. As an 
illustration, the prediction based on dissonance 
theory appeared on the screen next to the predic- 
tion based on reinforcement theory. The slides 
were titled to identify the content and to indicate 
at a glance which slide in the pair was congruent 
and which was discrepant. For the sake of sim- 
plicity, reinforcement theory was labeled the 
“law of reward" and dissonance theory the ‘‘the- 
ory of conflict." The congruent and discrepant 
slides were randomly assigned to the two projec- 
tors to prevent either type from appearing con- 
sistently on one side of the screen. 

The first segment of slides (seven slide pairs) 
described Festinger and Carlsmith’s experimental 
procedures. The third segment (seven slide pairs) 
described the experimental results. The two slides 
in each pair in both of these segments were iden- 
tical, a procedure adopted to permit the use of the 
same slide-changing routine throughout the ex- 
periment. 


Procedure 


The subject was given a brief overview of the 
tasks he was to perform including practice in the 
slide-changing routine. This routine consisted of 
pressing two control buttons simultaneously, pro- 
jecting a slide pair on the screen. The subject 
decided which was the less interesting slide in the 
pair, turned it off, and examined the remaining 
slide. When he finished, that slide was turned off 
to reveal a blank sereen which was the cue to begin 
another cycle by pressing both control buttons. 
The subject was free to set his own pace. 

Because subjects were not told that they would 
be tested on the slide content, the task of examin- 
ing the slides required a plausible explanation. 
Accordingly, it was explained that the experi- 
menters were preparing instructional materials 
on the topic of attitude change and that the sub- 
ject could help by identifying what he considered 
to be the most interesting slides. It was stressed 
that only his actual personal preferences would 
be useful. 

At this point, a problem on attitude ehange was 
described to the subject who was given a choice 
between two alternative outcomes. The subject 
had to decide whether a large or a small reward 
was most effective in changing attitudes. Con- 
ceptual conflict was induced by presenting con- 
trived results of an experiment on this topic. The 
Subjects in the contradiction condition were given 
experimental evidence which disagreed with their 
selection of the larger payment. The subjects in 
the confirmation condition were given evidence 
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which agreed with their selection of the larger 
payment. In the doubt condition, subjects were 
told that two experiments were conducted. They 
were shown the same bogus evidence given the 
confirmation subjects as well as the evidence given 
to the contradiction subjects. An absolute control 
group differed from the experimental groups only 
in that it received neither supporting nor con- 
tradictory evidence. 


Measures 


Each time the subject turned either of the slide 
projectors on or off, his response was registered 
by an event recorder. The measure of information 
seeking obtained from this procedure was examina- 
tion time, that is, the time the subject spent ex- 
amining the more interesting member of any slide 
pair after turning off the less interesting slide. 
Reaction time was the interval between turning on 
both projectors and turning off the less interesting 
slide in the pair. Information acquisition was 
determined by a 33-item multiple-choice test of 
the content on the slides. 

Ratings of interest in “reward” and “conflict” 
slides provided one measure of selective exposure. 
In addition, the subject rated his interest in six 
“reprints from psychological journals." The titles 
of two of these articles were congruent with his 
beliefs, two were neutral, and two were discrepant. 
Ratings for each type of article were averaged to 
provide a measure of selective exposure. 

For each of the 16 pairs of slides in the second 
segment, the subject had the opportunity to select 
congruent or discrepant information. The number 
of discrepant slides chosen (DSC) was tabulated. 
Since the subject may have selected the discrepant 
slides more often than the congruent slides, but 
spent less time examining them, a discrepant time/ 
examination time ratio (D/E ratio) was computed 
to reflect the proportion of time spent examining 
discrepant information. Diserepant time is the 
examination time for only the discrepant slides, 

The multiple-choice test was comprised of part 
scores with items which required knowledge of 
congruent information (nine items), discrepant 
information (seven items), and both congruent 
and discrepant information (seven items). Other 
items required knowledge of Festinger and Carl- 
smith's (1959) experimental procedures and re- 
sults. 


RESULTS 


In order to assess the extent to which the 
treatments had been induced, subjects 
rated the amount of conflict they perceived 
between slides in a pair on a 100-point scale 
at the end of the experiment. A single- 
factor analysis of variance was conducted 
on these scores. Although the differences fell 
short of significance (F = 2.02, df = 3/71, 
p > .05), the direction of the means was 
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TABLE 1 
Means AND STANDARD DEVIATIONS FOR 
Reaction Time, Examination TIMES, 
AND ÁCQUISITION or KNOWLEDGE 


Experimental condition 
Measure [55 Doubt Control Confirma- 
15) 30) 15) | (n— 15) 
Reaction time* 
x 155, 73/235, 25/165.47| 148.80 
SD 95.121148.43]112.05| 113.51 
Examination time* 
bie 158.33/126.98/101.70| 93.50 
SD 55.88] 70.21 66.67| 35.01 
Examination time? 
X 97.17] 89.32| 81.97| 78.20 
SD 34.10| 46.27| 25.07| 25.62 
Test score* 
p.d 18.67| 20.47| 17.40| 14.03 
SD 4.03) 3.45| 3.40| 3.95 


“For congruent and discrepant slides (in 
seconds), 

^ For Festinger-Carlsmith results (in seconds). 

* Items correct. 


consistent with expectations and with re- 
sults on dependent measures reported be- 
low. Both the contradiction (X — 74.81) 
reported 

control (X = 
69.33) while the confirmation group (X = 
58.47) reported less, 

Findings relevant to the hypothesis that 
contradiction results in greater examination 
and recall than confirmation are summarized 
in Table 1. The analysis of variance of 
examination time devoted to slides which 
paired congruent and discrepant informa- 
tion (Segment 2 slides) was significant (F = 
3.46, df = 3/7, p < 025). In order to 
test pairwise differences among the means, 
Dunn’s multiple-comparison procedures were 
used in this and all other analyses (p = .05). 
Examination time for the contradiction con- 
dition was significantly longer than for the 
confirmation condition. The examination 
time for pairs of identical slides which 
depicted the results of the Festinger-Carl- 
smith experiment (Segment 3 slides) was 
similar to the examination times for Seg- 
ment 2, but the differences in thi case were 
not significant (p > .05). A similar analysis 
of variance revealed that the groups dif- 
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fered significantly in the amount of informa 
tion acquired (F = 8.05, df = 4 


conditions had higher scores on the multipl 
choice test than the confirmation conditi 
according to Dunn’s procedures, althou 
they did not differ from each other. Fina 
doubt subjects spent more time than tho 
in other conditions deciding whether 1 
congruent or discrepant member of each 
pair of slides in Segment 2 was more inter 
esting; however, this difference was not 
reliable (F = 1.82, df = 3/71, p > .05), 
Selective exposure was measured by the 
selection of articles for future reading which 
agreed (congruent) or disagreed (discrepant) 
with subject’s existing beliefs, the interest 
reported in congruent or discrepant slides, 
the discrepant slide choice (DSC), the time 
Spent examining discrepant slides (D/E 
ratio), and finally, the acquisition of informa- 
tion about the congruent or discrepant 
positions. The findings obtained on each 
dependent measure will be examined in turn. 
The means displayed in Table 2 summa- 
rize the ratings of interest in reading articles 
which are congruent or discrepant with a 
belief the subject holds or which are un- - 
related to that belief (i.e., neutral articles). ! 
Since neutral articles neither supported nor 1 
contradicted the subject's existing beliefs, 
there was no reason to expect differences in 
the ratings of these articles across treat- 
ments. A single-factor analysis of variance 
yielded no significant differences among the 
treatment groups. However, a similar analy- 
sis of the selection of congruent articles was 
significant (F = 3.65, df = 3/71, p < 025). 
Congruent articles were rated more interest- 
ing by confirmation subjects than by sub- 
jects in the contradiction condition. Con- | 
versely, discrepant articles tended to be 
rated more interesting by subjects in the 
contradiction and doubt conditions than by 
subjects in the confirmation condition, al- 
though these differences were less raliat 
than those for congruent articles (F = 1.74 
df = 3/71, p > 05). "E 
These findings imply a Type of Article 
Level of Conceptual Conflict interaction. 
In order to test this assumption, the ratings 
of congruent and discrepant articles wel? 
treated as repeated measures in a 2 X d 
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TABLE 2 
Means ron MEASURES or SELECTIVE Exposure 


Experimental condition 


Measure 
Contradictioi Doubt Confi i 

(a= 18) (n = 30) o1" Gag) 

Ratings of interest in articles (100-point scale) 
Congruent 61.07 68.40 71.33 74.07 
Neutral 66.20 63.37 59.60 60.33 
Discrepant 70.73 67.53 59.53 56.73 

Ratings of interest in slides (100-point scale) 
Congruent 43.00 63.40 75.13 69.47 
Discrepant 79.93 68.70 51.33 55.27 

Choice and examination of slides 
Choice of discrepant slides 12.07 6.43 6.20 5.26 
D/E ratio .76 .40 .36 .94 
Part scores on multiple-choice test (in percentages) 
Congruent information (n — 9) 45.13 62.63 60.73 48.07 
Congruent and discrepant information 

(n = 7) 52.33 60.47 57.13 41.00 
Discrepant information (n = 7) 67.60 64.80 37.07 41.93 


mixed factorial analysis of variance design 
with two treatment levels (confirmation 
and contradiction) and two repeated meas- 
ures (congruent and discrepant ratings). 
This analysis yielded a significant Treat- 
ment X Type of Article interaction (F = 
748, df = 1/28, p < 025). 

A single-factor analysis of interest re- 
ported in congruent slides yielded significant 
effects (F = 5.89, df = 3/71, p < .005) 
which were consistent with those regarding 
the selection of articles. Contradiction sub- 
Jects rated the congruent slides as less 
Interesting than subjects in each of the 
other three groups. A similar analysis of 
discrepant slides also yielded significant 
effects (F = 3.58, df = 3/71, p < .025) 
in which contradiction subjects rated dis- 
crepant slides higher than confirmation 
Subjects. Since these findings also imply a 
Treatment X Type of Slide interaction, a 
2 X 2 analysis of variance was conducted 
with ratings of congruent and discrepant 
Slides treated as repeated measures. Accord- 
ing to this analysis, there was a significant 


interaction effect (F = 14.61, df = 1/28, 
p < .001). 

Single-factor analyses of variance yielded 
significant effeets on both the number of 
discrepant slides chosen (F — 5.50, df — 
3/71, p < .01) and the ratio of discrepant 
time to examination time (F = 5.28, df = 
3/71, p < .01). The differences among the 
means were in the predicted direction and 
consistent with the findings based on interest 
scales (see Table 2). The greatest DSC and 
D/E ratios occurred in the contradiction 
condition, while the smallest DSC and 
D/E ratios occurred in the confirmation 
condition. Analyses of pairwise comparisons 
of both measures using Dunn’s procedures 
indicated that all treatments differed from 
the contradiction treatment (p < .05) but 
not from each other. 

The scores from three sections of the 
multiple-choice test of the experimental 
topie were converted to percentages to 
permit comparison (Table 2). According to 
a single-factor analysis of variance, there 
was a significant treatment effect on the set 
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of questions requiring both congruent and 
discrepant information (F — 2.88, df — 
3/71, p « .05). Although scores of the 
doubt condition were higher than the 
other experimental treatments, Dunn's pro- 
cedures revealed that the difference was 
significant only for the control group. 
Analysis of the set of items which measured 
retention of congruent information yielded 
a significant treatment effect (F = 3.31, 
df = 3/71, p < .05) in which confirmation 
subjects had higher scores than contradic- 
tion subjects. In contrast, the analysis of the 
items which measured retention of dis- 
crepant information yielded an F of 7.98 
(df = 3/71, p < .01) in which confirmation 
and control subjects scored lower than 
contradiction and doubt subjects. Since 
these results imply a Treatments X Kind of 
Information Retained interaction, a repeated 
measures analysis of variance was conducted 
using the procedures described above. This 
analysis yielded a significant Treatment X 
Type of Item interaction (F = 15.07, 
df = 1/28, p < .001). 


Discussion 


According to the present findings, the 
presentation of evidence which contradicts 
existing beliefs produces greater interest in 
and examination of the experimental topic 
than would otherwise occur, Not only was 
more information sought when evidence 
Supporting a subordinate belief was pre- 
sented alone or paired with evidence support- 
ing a dominant belief, but it was also re- 
tained. This finding, in particular, is con- 
sistent with Berlyne's (1960, 1965) theory of 
epistemic curiosity. 

Exactly what accounts for the greater 
retention in the contradiction and doubt 
conditions is not clear from the present 
results. In regard to predecision information 
seeking, at least the effect of uncertainty is 
primarily reflected on indices of information 
processing (Driscoll & Lanzetta, 1964). 
Accordingly, the greater retention in the 
contradiction and doubt conditions may 
have been due to processes such as rehearsal 
or reorganization. In any event, the superior 
retention obtained by strengthening a sub- 
ordinate response alternative extends find- 
ings by Berlyne (1954) and later by Frick 
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and Cofer (1972) which demonstrated thi 
curiosity generated by prequestions results 
greater retention. 
The preference for congruent informatio 
by subject in both the confirmation a 
control conditions, which did not differ from 
each other, clearly supports the selecti 
exposure hypothesis derived from Festinger’ 
(1957) theory of cognitive dissonance. How: 
ever, the tendency of subjects in the con: 
tradiction condition to select and acquire 
discrepant information is difficult to account 
for by the theory of cognitive dissonance 
unless one assumes that contradictory 
evidence produced extreme dissonance. 
Typically, one would expect the presentatio 
of evidence which contradicts subjec 
existing beliefs to increase  dissonano 
moderately, thereby resulting in a greater 
selection of congruent information tham, 
would occur without dissonance increasing 
manipulation (Freedman & Scars, 1965). 
The differences in selectivity between the 
confirmation and contradiction conditions 
may be due, in part, to the greater intrinsic 
utility the discrepant information held for 
the contradiction subjects. Acc »rding to this 
hypothesis, discrepant information will be 
Sought when knowledge of the existing 
belief is not sufficient to reduce curiosity. 
Thus, contradictory information is sought 
for its intrinsic utility, even at the risk of 
increasing dissonance and just as it 38 
sought for its practical or extrinsic utility 
(Canon, 1964; Freedman, 1965). | 
This formulation, which appears to be 
supported be the present findings, may ex 
plain other instances in which discrepant 
information was not avoided, or it was 
Sought. For example, in one study, subjects 
read heavily biased case reports of a criminal 
trial which favored either the defense or 
Prosecution summation (Sears, 1965). The 
results indicated that subjects preferred 
information which was opposite both to the 
biased “factual” report they had read and 
to their own opinions. Since it is well known 
that there are both defense and prosecution 
sides to any legal issue, the mock trial e 
Perimental situation may have generat 
curiosity which required discrepant informa- 
tion for its reduction, Similar interpretations 
can be made of other mock trial studies I? 
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which discrepant information was preferred 
or in which preference for congruent infor- 
mation was not found (Sears, 1966; Sears 
& Freedman, 1965). 

To the extent that the present analysis is 
valid, intrinsic utility is a factor to be in- 
vestigated in its own right as well as a 
variable to be controlled in selective expo- 
sure experiments. Since whatever creates 
dissonance may also arouse epistemic 
curiosity, the effects of the two variables 
may be difficult, if not impossible, to 
separate (Rhine, 1967). Nevertheless, in the 
present experiment, only the explicit con- 
tradiction of the subjects’ beliefs generated 
the amount or type of conflict necessary to 
direct them toward discrepant information. 
Whatever conflict was created by simply 
justapositioning both congruent and dis- 
crepant information in the slide pairs of 
Segment 2 (ie. the control condition) 
was not sufficient to prompt subjects to risk 
the increase in dissonance implied by the 
examination of discrepant information. 

Differences in selectivity between the 
contradietion and doubt conditions are 
particularly interesting and are especially 
important because of their implication for 
education. Contradiction typically directs 
the subject’s interest in and examination 
and retention of information toward the 
discrepant position at the expense of con- 
gruent information. The effects of doubt 
are not so myopic. On almost every meas- 
ure doubt scores are relatively high. More- 
over, they are consistently higher than 
confirmation scores on measures of prefer- 
ence for discrepant information and higher 
than contradiction scores on measures of 
preference for congruent information (Table 
2). Only on the measures of choice of dis- 
crepant slides and time spent examining 
discrepant slides does the behavior of doubt 
subjects resemble that of confirmation sub- 
jects. This effect may be explained by the 
tendency of doubt subjects to spend more 
time looking at both slides in a pair before 
turning off the less interesting member. 

Thus, doubt appears to produce a rela- 
tively balanced examination of congruent 
and discrepant views which is necessary for 
the study of controversial issues. In con- 
trast, confirmation directs information seek- 
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ing toward a congruent position in apparent 
accordance with dissonance theory, while 
contradiction directs information seeking 
toward a discrepant position, perhaps be- 
cause of the intrinsic utility that informa- 
tion has acquired. 
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INFLUENCE OF CULTURAL BACKGROUND ON 
HIERARCHICAL LEARNING 


RUSSELL D. LINKE* 


Monash University, Clayton, Australia 


This study researched the cross-cultural validity of a learning hier- 
archy of graphical interpretation skills with 192 Grade 7 students in 
‘Australia and 200 indigenous Grade 9 students in Papua New Guinea. 
Both groups contained approximately equal numbers of male and fe- 
male students, with a mean age of 12.5 and 15.5 years, respectively for 
Australia and Papua New Guinea. The learning hierarchy was con- 
structed by logical task analysis, and validated by means of a com- 
prehensive instructional and testing program. Each pair of hierarchi- 
cally related skills was tested for significance (p œ .001) and power 
with a modified version of the R. T. White and R. M. Clark test of 
inclusion. Results indicate that the pattern of acquisition of interpre- 
tative skills was substantially the same for both groups of students, 
irrespective of their different cultural backgrounds, providing impor- 
tant cross-cultural evidence for the principle of hierarchical learning. 


eg ~ 


The principle of hierarchical learning in- 
volves the sequential acquisition of logically 
related and progressively complex capa- 
bilities, with a fundamental condition that 
the learning of any particular skill is de- 
pendent on the prior learning of specified 
prerequisite or subordinate skills. This prin- 
ciple is more or less consistent with many 
recent models of learning and instruction, in- 
cluding those of Gagné (1965), Ausubel (see 
Ausubel & Robinson 1969 pp. 37, 59), Bloom 
(1956, p. 30), and Merrill (see Tenny- 
son & Merrill, 1971), but it is probably pre- 
sented most explicitly in that proposed by 
Gagné. The logical foundation and struc- 
tural simplicity of Gagné’s model have made 
it particularly suitable for experimental re- 
search, and a number of validation studies, 
based on the model established by Gagné 
and Paradise (1961), have been conducted in 
recent years. These studies have so far been 
concerned, however, with basic methodologi- 
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eal problems involved in the construetion 
and validation of learning hierarchies, as in- 
dicated in recent reviews by Walbesser and 
Eisenberg (1972) and White (19732). 

Many of the more important methodo- 
logical problems inherent in the earlier stud- 
ies, recently discussed by White (19732) 
and Linke (1973), have now been substan- 
tially resolved, and a considerable amount 
of evidence established in support of 
Gagné's model. This evidence, however, has 
largely been derived from isolated valida- 
tion studies in Australia and, more particu- 
larly, in the United States of America, with 
no previous attempt at replication in other 
geographical areas, or with students from a 
different type of cultural environment. Thus, 
prior to this research, it had yet to be es- 
tablished that Gagné’s model of hierarchical 
learning, or the principle on which this 
model was based, was in fact a comprehen- 
sive one, independent of cultural back- 
ground. The present research was thus in- 
tended to provide a foundation of cross- 
cultural evidence for Gagné’s model, through 
the construction and validation of a com- 
mon learning hierarchy of graphical inter- 
pretation skills with two groups of students 
from widely different cultural and educa- 
tional backgrounds. 
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Vietoria (Australia) and Papua New 
Guinea were chosen as suitable areas for 
the two comparative validation studies. This 
choice was made partly for pragmatic rea- 
sons (Papua New Guinea being at that time 
an Australian protectorate with a common 
educational language, and thus relatively 
accessible for comparative educational re- 
search) and partly because of the well-es- 
tablished differences in cultural and educa- 
tional background between Australian stu- 
dents and those in Papua New Guinea 
(Prince, 1969; Ralph, 1968). It was there- 
fore expected that a common validation re- 
sult with students from these two areas 
would provide a valuable basis of cross- 
cultural evidence for the principle of hier- 
archical learning. 


Construction and Trial Validation of the 
Learning Hierarchy 


A comprehensive learning hierarchy of 
graphical interpretation skills (see Figure 
1) was constructed by the method of task 
analysis initially proposed by Gagné and 
Paradise (1961), then checked for logical 
consistency by a number of curriculum ex- 
perts. An extensive analysis was subse- 
quently made of possible subdivisional skills 
within each of these basic interpretative 
abilities, For example, the ability to calcu- 
late coordinate position on a two-dimen- 
sional grid (Element 1/1[A] in the postu- 
lated learning hierarchy) was examined for 
possible differences arising from axis orien- 
tation (horizontal or vertical), numerical 
value (integral or decimal), or sign (posi- 
tive or negative). This analysis of subdivi- 
sional skills was intended to define more 
precisely the limits of lateral transfer in- 
herent in each of the basic skills, and thus 
to avoid invalidating likely connections 
through comparison of different subdivi- 
sional skills at successive levels of the learn- 
ing hierarchy. This procedure was initially 
used by White (1971), although the method 
of statistical analysis for differentiating in- 
dependent subdivisional skills was later 
modified by Linke (1973). 

Following the analysis of subdivisional 
Skills, certain modifications were made to 
the postulated learning hierarchy, so that 
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each basic skill was defined within a si m 
range of lateral transfer (subdivisig 
conditions. Appropriate instructional g 
testing materials were subsequently 
pared for a trial validation study. 
materials were presented in programmal 
form, as proposed in an earlier stud: 
White (1971), so that each of the 
skills in turn was defined with approprial 
instructions, illustrated with a fully worke 
example, and finally examined with tw 
analogous questions. The sequence of 
entation generally followed a linear pattern, 
progressing from simpler to more complex 
skills along each strand of the learning hier 
archy, with revisionary questions included 
at each of the branching points. A limited 
validation trial was then conducted in Vio- 
toria with a single class of students at each 
of three consecutive academic levels (Grades 
6-8). This trial was intended as a practical 
check on the presentation sequence of skills. 
It also provided general information on the 
applicability of instructional materials for 
each of the three selected grades, from which 
the most appropriate level for the major 
validation study could then be determined. 
A similar trial was conducted with the same 
validation program in Papua New Guinea, 
so that any problems arising specifically 
with students in that area could be examined, - 
and if possible rectified in the major study. 
The most obvious result emerging from _ 
the validation trials was the difference be- 
tween students in Vietoria and Papua New 
Guinea with respect to general reading abil- 
ity. This appeared to be substantially lower 
for students in Papua New Guinea than for 
those at equivalent grade levels in Victoria. 
To some extent this result had been antici- 
pated, since similar differences in reading 
ability, and in the understanding of common 
scientific terms, had previously been re- 
ported by Gardner for secondary students 
in Victoria (Gardner, 1972) and Papua New 
Guinea (Gardner, 1971). These results had 
been considered in preparing the validation 
program, so that any terms obviously ur 
familiar to the students, such as "gradient, 
"axis," "tangent," and "displacement," were 
explained in more familiar terms and illus. 
trated by reference to appropriate examples. j 
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5/1 Calculate the 
gradient of a curve 
on a two-dimension- 
al grid at a fixed 
point of contact 
S»ecificd by one 
(Horizontal or 
Vertical) 
co-ordinate. 


3/2(B) Identify 3/1 Identify from 
And mark the Turning mixed sample of 
Point on a given convex and concave 
curve. curves, those with 
a Maximum or 


Minimum Turning 
Point. 


5/2() Calculate 
the gradient of a 
straight line 

segment drawn on 
a two-dimensional 


2/1(a) Calculate 
the Horizontal or 
Vertical position of 
a point, specified 
by one co-ordinate, 


3/2(A) Calculate 
the Maximum or 
Minimum value of a 
curve drawn on a 
two-dimensional 


arid. interpolated between grid, 
a given row of 
points on a two- 
Wo grid x 
5/3(A) Calculate 4/1 Calculate the — "1/1(A) Calculate J 


the gradient of a 
straight line 
segment, given both 
the Horizontal and 
Vertical displace- 
ment values. 


Horizontal or 
Vertical displacement 
between two given 
points on a two- 
dimensional grid or alternate co- 
line-segment graph. ^ ordinate, on a two- 


dimensional line- 
| qu h. 


4/2 Calculate the 1/2 Calculate the 
displacement between Horizontal or 

two given points on Vertical position 
a single Horizontal of a given point on 
or Vertical number a two-dimensional 
line. 


the Horizontal or 
Vertical position 
of a point, 

specified by one 


5/4(A) Calculate 
the quotient of 
two positive 
integral numbers 
(exact results). 


4/3 Calculate the 
difference between 
two positive 
integral numbers. 


1/3 Calculate the 
position of a given 
point on a single 
Horizontal or 
Vertical number 
line. 


Basic SKILLS OF GRAPHICAL INTERPRETATION - ÂN 
QuTLINE OF THE POSTULATED LEARNING 
HIERARCHY 


CLASSIFICATION CODE 


The first number for each element represents the 
interpretative area, and the second number indicates 
the level within that area, rated downward from the 
relevant terminal skill. Where letters are also used, 
these indicate secondary sequences within the same 


interpretative are 


INTERPRETATIVE AREA 


1. Position (Location of Co-ordinate Points) 
2. Position (Interpolation and Extrapolation) 
3. Position (Classification of Turning Points) 
4. Displacement 

5. Gradient 

6. Area 
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2/1(B) Calculate 
the Horizontal or 
Vertical position of 
a point, specified 
by one co-ordinate, 
extrapolated beyond 
a given line segment 


S/2(B) Draw the 
Tangent to à curve 
on a two-dimensional 
grid, at a fixed 
point of contact 
specified by one 
(Horizontal or 


(or row of points) Vertical) 

on a two-dimensional ordinate. 
grid. 

1/1(B) Mark the 5/3(B) Draw the 


position of a point, 
specified by one co- 
ordinate (Horizontal 
or Vertical) on a 
two-dimensional line- 
segment graph. 


Tangent to a curve 
at a given point of 
contact. 


6/1 Calculate the 
approximate area 
(by the method of 
counting squares) 
enclosed between 
two points of a 
civen line segment, 
each specified by 
one co-ordinate, 
and the Horizontal 
axis of a two- 
dimensional grid. 


6/2 Calculate the 
approximate area 

(by the method of 
counting squares) 
enclosed between 
two marked points on 
a given line segment 
and the Horizontal 


axis of a two- 
("a grid. 


6/3(A) Classify the — 6/3(B) Calculate 
blocks to be counted the area of a single 
in order to block on a two- 
calculate the area dimensional grid 

of a specified from the Horizontal 
section on a two- and Vertical scale 


dimensional line- calibrations. 
segment graph, where 
some of the blocks 
are cut by the given 
line. 
6/4(B) Calculate 
area of a 


rectangular block, 
given the values 
for length and 
height. 


6/5(B) Calculate 
the product of two 
positive integral 
numbers. 


Figure 1. Basic skills of graphical interpretation—an outline of the postulated learning 


hierarchy. 
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Apart from pointing to the differences in 
general reading ability, the validation trials 
revealed no major inconsistencies in the pos- 
tulated learning hierarchy. However, cer- 
tain instructional deficiencies were observed 
in a few of the more complex skills, and thus 
additional examples were later provided in 
the final validation program. 

The selection of testing levels for the ma- 
jor validation studies was based on the over- 
all range of reading and computational abil- 
ity in each of the trial grades, which in turn 
was indicated by the range of difficulty lev- 
els for individual skills across the total 
learning hierarchy. Consideration was also 
given to possible interference from previous 
curricular experience in related subject 
areas, but a study of the relevant syllabus 
outlines for both science and mathematics 
suggested that this was unlikely to be a 
serious problem for either group of students. 
Form 1 (Grade 7) was finally chosen as the 
most appropriate level for testing in Vic- 
toria, and Form 3 (Grade 9) for Papua New 
Guinea. 


Administration of the Major Validation 
Studies 


The major validation study in Victoria 
involved a total of 192 Form 1 students from 
11 randomly selected high schools in Mel- 
bourne, Australia. This group contained ap- 
proximately equal numbers of male and fe- 
male students, ranging in age from 11 to 14 
years (taken to the nearest year) with a 
mean of 12.5. The corresponding study in 
Papua New Guinea involved a total of 200 
indigenous Form 3 students from all six cen- 
tral district high schools following the na- 
tional curriculum. This group, which repre- 
sented almost half the total number of in- 
digenous students at that level in the central 
district, contained 139 male and 61 female 
students, the combined group ranging in age 
from 13 to 19 years, with a mean of 15,5. 

The administrative conditions were simi- 
lar for both of the major studies. The vali- 
dation program, which contained both in- 
structional and testing materials, was di- 
vided into three convenient sections, each 
dealing with a different set of skills. These 
sections were presented consecutively, and 
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Questions Higher Skill 
correct (4/1) 
0— 1 2" Total 


Number of exceptions — 1 


Ficure 2. Contingency table format used fo 
validation results. (Results derived from the m j 
validation study in Papua New Guinea for th 
postulated hierarchical connection between ele: 
ments 4/2 and 4/7.) 


since no time limit was imposed, short 
breaks were given for relief at regular in: 
tervals throughout the testing period. The 
working rate was relatively constant for al : 
of the classes involved, and no significant 
difference was observed in the range of com- 
pletion times between the various classes in 
Victoria and those in Papua New Guinea, 
although in the latter case the students were 
Several years older, and nominally at a 
higher academic level. The first of the stu- 
dents to finish in each class completed the ' 
total program in less than 60 minutes, while - 
the last took approximately twice this 
amount of time. 


RESULTS 


The results for each pair of hierarchically 
related skills were arranged in a 3 X 3 con- 
tingency table (see Figure 2), with the num- 
ber of questions correct for each individual 
skill represented by the appropriate margi- 
nal totals. Thus the number of exceptions 
observed for each of the postulated hier- 
archical connections—that is, the number 
of students with neither question correct for 
the lower skill, and both correct for the 
higher skill—was indicated in the bottom, l 
right-hand (0/2) cell. This number was 
then compared with an expected maximum 
or critical number, calculated from the var- 
ious marginal totals according to the method 
proposed by White and Clark (1973), and 
if the actual number of exceptions observe! 
was greater than that excepted, then the 
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postulated connection was rejected as in- 
valid under the relevant null hypothesis. 

This procedure for the analysis of indi- 
vidual hierarchieal connections followed 
basically the method proposed by White 
(1971), although certain modifications were 
made to the statistical validation technique. 
For example, the null hypothesis used by 
White was that no exceptions should be 
allowed other than those arising through er- 
rors of measurement, yet this ignored the 
possibility of legitimate exceptions arising 
through the use of unidentified prerequisite 
skills or alternative learning pathways. The 
calculation of displacement, for example, 
was postulated in this case to be dependent 
on the skills of subtraction and coordinate 
location, but could also be achieved, without 
these skills, by counting the number of 
marked divisions from one point to the 
other, although this would be a relatively 
inefficient procedure, and probably rarely 
used for graphical interpretation. In ac- 
cordance with this argument, and with 
Gagné's (1970) more recent orientation to- 
ward substantial, rather than absolute levels 
of hierarchical dependence, the null hypoth- 
j esis was tested at three different levels of 
stringency. These included an absolute level 
(designated 00) as previously defined by 
White (1971), and two additional levels (01 
and 02) which allowed for 1% and 2%, re- 
spectively, of those possessing the higher 
skill to lack the postulated prerequisite, and 
thus to have used alternative or unspecified 
subordinate skills. The power in each case 
was caleulated against the same alternative 
hypothesis, which in turn was defined to al- 
low a much higher level of exceptions (10%), 
thus indicating no substantial relationship 
of hierarchical dependence. 

The overall level of significance was arbi- 
trarily set at .05, so that the probability of 
a Type I error (false rejection of the null 
hypothesis) for each of the 61 tests (includ- 
ing parallel connections for different subdi- 
Visional conditions) in each of the major 
Validation studies was determined by « = 
1~ V T= .05~ 001 (Linke, 1973). 

, À summary of results for both of the ma- 
lor validation studies is presented in Table 
1. These results indicate the number of 0/2 
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cell exceptions observed for each of the pos- 
tulated hierarchical connections, the maxi- 
mum or critical number expected from the 
marginal totals of the relevant contingency 
table, together with the appropriate null hy- 
pothesis (00, 01, or 02), and for each of the 
valid connections the statistical power of 
the test. Certain of the postulated connec- 
tions were tested in several parallel forms, 
corresponding to different subdivisional 
groups within the relevant skills, but since 
these results were generally quite consistent, 
only one of the parallel forms has been pre- 
sented in Table 1. 


Discussion 


In spite of differences in difficulty level 
for individual skills, the results for both val- 
idation studies were generally quite con- 
sistent, and clearly showed that the learning 
hierarchies validated independently for high 
school students in Victoria and Papua New 
Guinea were essentially the same. In each 
case, most of the postulated connections 
were accepted as valid, and all but one of 
these at the absolute null hypothesis (00) 
level. This result provides further evidence 
in support of Gagné’s model, and estab- 
lishes, at least in a limited sense, that the 
principle of hierarchical learning may be 
largely independent of both cultural and 
educational background. Before discussing 
the significance of this result, however, it 
may be useful to explain a few points of 
methodological importance. 

The occasional inconsistencies observed 
in these results, for example the postulated 
connections between skills 1/2 and 1/1(B), 
1/1(A) and 2/1(B), and 1/1(A) and 
3/2(A), which were each rejected at the 
weakest null hypothesis level in one of the 
validation studies, occurred through inci- 
dental acquisition of two positional skills 
(1/2 and 1/1(A)) in the course of subse- 
quent attempts at more complex capabili- 
ties. An analysis of test-retest results for 
the two anomalous skills confirmed this hy- 
pothesis of incidental acquisition, and indi- 
cated that the problem was caused by repe- 
tition of certain instructional procedures for 
each of the relevant superordinate skills, 
which effectively provided an additional op- 
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TABLE 1 
OUTLINE or VALIDATION RESULTS 
Victoria Papua New Guinea l 
Lower skill Higher skill Number of Exceptions PEN Number of Exceptions "T 
Observed Expected Observed Expected* | 
E 
12 4 6/01 97 3 3/00 1.00 
ve 118) 15 12/02 — 5 dos HS 
tao V 3 aoe 49 4 6/00 ‘97 
1/2 7 í 9 
VA) 2/1(A) 14 12/02 = 19 Linked A 
1/1 (A) 2/1(B) 8 8/00 Al 17 th A | 
1A (A) 3/2(A) 13 12/02 — 7 9/00 " | 
3/2(A) 3/1 4 15/00 .02 n 20/ di | 
3/2(B) 3/1 2 7/00 12 T 9/00 : 
4/3 4/2 0 2/00 1.00 0 1/00 1.00 
1/3 4/2 0 1/00 1.00 1 2/00 1.00 
4/2 4/1 1 5/00 .97 1 9/00 90 — 4 
1/2 4/1 0 6/00 .95 2 8/00 a 
5/4(A) 5/3(A) 2 3/00 1.00 1 4/00 me 
5/3(A) 5/2(A) 3 3/00 .58 0 2/00 P | 
4/1 5/2(A) 1 3/00 .59 3 3/00 - | 
5/3(B) 5/2(B) 0 6/00 .74 1 8/00 .8 | 
1/1(B) 5/2(B) 2 4/00 .90 0 3/00 1.00 | 
5/2(B) 5/1 0 6/00 .01 0 9/00 .01 
5/2(A) 5/1 0 7/00 .01 1 11/00 .01 
6/5(B) 6/4(B) 0 2/00 1.00 0 2/00 1.00 
6/4(B) 6/3(B) 4 4/00 .98 1 2/00 1.00 
6/3(B) 6/2 0 5/00 .43 0 8/00 .36 
6/3(A) 6/2 0 7/00 29 0 11/00 18 
6/2 6/1 2 6/00 45 2 8/00 “04 
1/1(B) 6/1 0 2/00 64 0 1/00 .50 


* The second number in this column indicates the appropriate null hypothesis. 


portunity to learn the postulated prerequi- 
sites, Whether these results should be classi- 
fied as legitimate exceptions to the postu- 
lated learning hierarchy is an arguable 
point. It is possible, for example, as sug- 
gested by White (1971), that the subordi- 
nate skills might in fact have been acquired 
first within the context of the higher skill, 
and then immediately used to solve the more 
complex questions. Thus, any students suc- 
ceeding at the higher skill, and subsequently 
found to have acquired incidentally the pos- 
tulated subordinate skills, could be elimi- 
nated from the relevant validation test for 
want of definitive information on which of 
the skills was actually first aequired (White 
1971). It could also be argued, however, 
that any skill acquired incidentally should 
not be classified as an independent prerequi- 
site ability, and this leads in turn to an un- 
resolved argument of basie definition. It 


suffices to conclude for this study, however, 
that the validation test was conservative m 
accounting for all exceptions to the postu- 
lated learning hierarchy. 

Another source of inconsistent results, 
in fact one of the most significant problems 
with this validation technique, was the 
method of calculating statistical power- In 
some of the postulated connections, for ex- 
ample, the total number of students who 
failed the subordinate skill was lower ss 
the calculated critical number of 0/2 cel 
exceptions, so that the relevant connection 
could not possibly be rejected. In cases Y i 
as this, the power, or probability of detec 
ing a genuinely invalid connection, § yi 
have been extremely low, but since the ^ 
ternative hypothesis took no account 0 ipe 
ordinate-skill difficulty levels, the resulta 
levels of power were unrealistically high i f 
similar problem occurred in the prev? 


and ` 
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validation study by White (1971), and was 
caused by the same inadequate formulation 
of the alternative hypothesis. In contrast 
with White's formulation, however, the al- 
| ternative hypothesis was related in this case 
to the number of students who succeeded at 
the superordinate skill, so that high diffi- 
culty levels, as expected, reduced the effec- 
tive difference between null and alternative 
hypotheses, and hence also drastically re- 
| duced the statistical power. Unfortunately, 
this situation was often complicated by large 
| numbers of experimental errors, both for 
| guessing and chance mistakes, occurring in 
many of the more difficult, skills. Although 
this increased the critical number of 0/2 
cell exceptions, and hence the probability of 
accepting the relevant connection, it also 
reduced substantially the power of the test. 
There is some consolation in these results, 
however, that the observed number of ex- 
ceptions was generally much lower than that 
expected, and that the results for both vali- 
dation studies, and indeed all parallel tests 
for different subdivisional skills, were in- 
variably consistent. 

Methodological issues aside, the predomi- 
nantly similar patterns of hierarchical learn- 
ing, independently established in parallel 
studies for high school students in Victoria 
and Papua New Guinea, have provided an 
important foundation of cross-cultural evi- 
dence for Gagné's model of hierarchical 
learning. The significance of this result is 
probably enhanced by the fact that it covers 
a difference of several years in age and nom- 
inal academic level, and substantial differ- 
ences in cultural and educational back- 
. ground. Moreover, the difference in cultural 
background between Australian students 
and those in Papua New Guinea appears to 
be particularly pronounced for quantitative 
intellectual skills of the type defined in this 
Tesearch. Prince (1969) observed, for exam- 
ple, that Western scientific concepts were 

almost nonexistent” in the indigenous cul- 
tures of Papua New Guinea, and that math- 
ematical concepts were “even more rudimen- 
tary.” Johnson (1968) also observed that 
Concepts such as quantity, space, and time 
Were “not expressed in indigenous New 
Guinean culture or its languages [p. 21]," 
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and King (1970) has more recently sug- 
gested that children in Papua New Guinea 
might not form mathematical concepts “ei- 
ther in the same order or way, or at the same 
chronological age as their counterparts in 
modern Western society [p. 58].” This re- 
search has established at least that the pat- 
tern of acquisition for certain basic skills of 
graphical interpretation is substantially the 
same. 

The difference in formal educational 
background between Australian students 
and those in Papua New Guinea is much 
less pronounced than that of cultural back- 
ground, but nevertheless important. Al- 
though English, for example, is the common 
educational language in both areas, for in- 
digenous students in Papua New Guinea it 
is very much a second language, rarely 
spoken outside the classroom. In addition, 
both primary and secondary curriculum 
studies in Papua New Guinea, although es- 
sentially prepared by Australian and over- 
seas expatriate teachers, have been devised 
with some consideration for the character- 
istic needs of a relatively primitive and seg- 
regated rural society (Ralph, 1968). Thus, 
very few indigenous students, and none of 
those involved in this research, would have 
experienced a similar course of study to that 
used in Victorian schools. 

In spite of these differences in curricular 
background, however, it must be admitted 
that more than half of the secondary teach- 
ers in Papua New Guinea are Australians or 
overseas expatriates, as are many of the 
primary teachers and most of the teacher 
training staff (Education Department of 
Papua New Guinea, 1973). Moreover, the 
use of stringent academic selection proce- 
dures, which in turn cause a drastic reduc- 
tion in the number of students continuing 
from primary to lower secondary levels, 
means that the Form 3 students involved in 
this study probably represent, in the West- 
ern sense, an academic elite among the in- 
digenous population. It could therefore be 
argued that the common pattern of learning 
observed in this research might simply be 
a product of successful acculturation for the 
indigenous students of Papua New Guinea 
in a basically Western system of formal ed- 
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ucation. There is no complete counter argu- 
ment, since it would obviously be impossible 
to conduct a strictly comparative validation 
study in any of the many vernacular modes 
of Papua New Guinea. However, the evi- 
dence provided by Prince (1967) and Mac- 
kay and Gardner (1969), concerning char- 
acteristic difficulties with scientific concepts 
and quantitative skills among relatively sen- 
lor and academically successful students 
in Papua New Guinea, emphasizes the over- 
riding influence of traditional culture on the 
learning of complex and formalized intellec- 
tual skills, and therefore counteracts, at 
least to some extent, the argument of suc- 
cessful acculturation. 

It seems appropriate at this stage to make 
two cautionary notes regarding the poten- 
tial generalizations of this result. In the first 
place, learning hierarchies of the type de- 
fined in this research may only be appro- 
priate for generalized or intellectual skills 
(White, 1973b), and other forms of learning, 
such as memorization (Tennyson & Merrill, 
1971), rote learning (Ausubel & Robinson, 
1969), or simple recall of verbal informa- 
tion, may not conform to the same basic 
principle of hierarchical acquisition. Second, 
the validation process used in this research 
was intended only to examine immediate 
application of the relevant intellectual 
skills, and did not incorporate any test of 
long-term meaningful learning. In spite of 
these potential limitations, however, the 
cross-cultural validation of a common learn- 
ing hierarchy of graphical interpretation 
skills for students in Australia and Papua 
New Guinea has made an important con- 
tribution of empirical evidence to support ' 
the basic principle of hierarchical learning. 
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f OUTERDIRECTEDNESS IN CHILDREN OF THREE AGES AS A 
FUNCTION OF EXPERIMENTALLY INDUCED 
SUCCESS AND FAILURE 


DONALD L. MacMILLAN! anv DEBORAH L. WRIGHT? 
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Studied the problem-solving style of outerdirectedness in children from three 
grades under two conditions—experimentally induced success and failure. The 
total sample of 60 subjects were stratified by grade (second, fourth, and sixth) 
and sex. Subjects in each of the groups thus created were randomly assigned to 
treatment conditions. Success and failure were experimentally induced by means 
of a puzzle task, and subjects were then required to perform two dependent tasks 
(a puzzle task and a sticker game imitation) from which outerdirectedness was 
inferred. Results supported the hypothesis that children shift from an outer- 3 
directed to an innerdirected problem-solving style as age increases. No sex differ- 
ences were found on any dependent measure. The hypothesized effect of failure 
resulting in greater outerdirectedness received partial support, in that subjects 
evidenced more outerdirected behavior under failure conditions on the criterion 
puzzle task. This was possibly due to the similarity between experimental and 
criterion tasks. Several methodological issues pertaining to experimental induc- 
tion of failure, the task-specific nature of outerdirectedness, and the validity of the 


Outerdirectedness is defined as a style of 
problem solving characterized by a reliance 
on concrete situational cues rather than by 
active attempts to deduce abstract rela- 
tionships. In essence, this means that the 
individual whose problem solving is charac- 
terized as outerdirected relies unduly on 
external cues resulting in little reliance on 
his own cognitive resources. 

The vast majority of research on outer- 
directedness has focused upon mentally 
retarded subjects (see Sanders, Zigler, & 
Butterfield, 1968; Turnure, 1970a, 1970b, 
1970c, 1973; Turnure & Zigler, 1964), since 
outerdirectedness as a construct first arose 
out of a series of studies by Zigler (1966) 
and his associates which grew out of moti- 
vational explanations for what previously 
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dependent tasks for assessing outerdirectedness were discussed. 


had been considered rigidity (Kounin, 1962) 
in the retarded. Outerdirectedness is thought 
to be related to two factors: the level of 
cognition attained, or mental age (MA), and 
the degree of success experienced through 
employing whatever cognitive resources a 
child has available (Turnure & Zigler, 
1964). That is, the lower the MA, the more 
outerdirected one would be, and the more 
failure an individual has encountered in 
problem solving, the more he comes to dis- 
trust his own solutions and search for cues 
in the environment as guides. Hence, outer- 
directedness is not an inherent part of 
retardation but rather it is seen as an out- 
growth of the excessive failure encountered 
by the retarded. The evidence to date on the 
role of MA (Achenbach & Zigler, 1968; 
Massari & Mansfield, 1973; Yando & 
Zigler, 1971) is inconclusive with only 
Yando and Zigler (1971) demonstrating an 
MA effect. The role of failure has been well 
documented (Green & Zigler, 1962; Turnure, 
1970a, 1970b, 1970c; Turnure & Zigler, 
1964). 

Children with histories of success (such 
as the children serving as subjects in the 
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present study) are thought to shift develop- 
mentally from outerdirected to inner- 
directed problem-solving strategies (Achen- 
bach & Zigler, 1968: Turnure, 1973). One 
question tested by the present study was 
whether exposure to experimentally induced 
failure would result in normal children 
reverting back to an outerdirected style as 
they perceive their own attempted solutions 
to the tasks as inadequate. 

In the research conducted to date, non- 
orienting behavior is descriptive of when 
the subject looks away from the experi- 
mental task. For example, in the Turnure 
and Zigler (1964) study, the subject was to 
assemble one puzzle while the experimenter 
assembled a second puzzle. Observation of 
the experimenter’s activity could be helpful, 
detrimental, or of no consequence to the 
subject on a subsequent task to be assigned, 
The inference that the nonorienting behavior 
reflected "outerdirectedness" rather than 
“distractibility” depends on whether the 
second experimental task is performed more 
rapidly (or more slowly if the experimenter 
activity was irrelevant to the solution of the 
Second task) than the first task and/or more 
rapidly than a control group in which no 
experimenter activity was evidenced. 

In a recent study (MacMillan & Cauffiel, 
1973), the outerdirected construct was ex- 
tended to children of average ability with 
academic histories of failure (i.e., variously 
called underachievers, or children with learn- 
ing disabilities). The results of that study 
were supportive of the role of failure in de- 
termining outerdirectedness but raised cer- 
tain questions regarding the task-specific 
nature of outerdirectedness. That is, whether 
a subject is deemed outerdirected in his 
problem solving may depend on the specific 
nature of the dependent task used to meas- 
ure outerdirectedness. These methodological 
problems are considered in the present 
study. 


METHOD 


. Subjects 


A total sample of 60 subjects was stratified by 
grade level. Samples of 10 males and 10 females 
were drawn at random from two schools in a single 
urban school district from among each of the 
following grade levels: second, fourth, and sixth 
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TABLE 1 
CHRONOLOGICAL ÅGE or SUBJECTS 


Grade level 


SD 
Female 

M 

SD 


Note. N = 60. 


(chronological age, CA, ranges of 7 years 11 months 
to 8 years 10 months, 9 years 11 months to 10 years 
10 months, and 11 years 11 months to 12 years 10 
months, respectively). The school district in which 
the study was conducted is one of three districts 
in an urban area and serves primarily lower- § 
middle-class families. No children were used who 
displayed any motor, percept ual, or sensory im- 
pairments, nor were any children used who re- 
ceived any specialized remedial help. Three male 
and three female subjects had Spanish surnames 
with the remainder of the subjects being Anglo. 
Table 1 contains means and standard deviations 
for CA by groups. Hence a 3 X 2 (Grade Level X 
Sex) sampling design was created. 


Procedures 


Children were seen individually in their own 
School, on one occasion by a female examiner who 
was unaware of the hypotheses being tested. Sub- 
jects within each cell (e.g., second-grade males) 
were assigned at random to one of two treatments: 
success or failure. 

Interpolated success. Three pictures (a duck, a 
sailboat, and an airplane) were divided into three 
pieces. In a pilot study, these designs had been 
found to be easily recognized as to what they were 
a part of, and they were easy to assemble for 
children in the age range of subjects used in this 
study. Under the success condition, the experi- 
menter allowed the subject to complete the as- 
sembly of the puzzle and then clicked the stop- 
watch. The experimenter then removed the com- 
pleted puzzle and said, *That was very good. 
You're good at putting puzzles together.” Then 
the second puzzle was given to the subject and 
again he was allowed to complete it, whereupon 
the examiner said, “You did very well on this 
puzzle, could you put another one together for 
me?” The third puzzle was then presented and 
upon completion, the child was told, “You put 
this puzzle together better than anyone I have 
asked. You're very good at this.” 

Interpolated failure. Those subjects selected to 
receive the failure treatment were not allowed to 
complete any of the three puzzles. The puzzles 
used with the failure subjects were divided into 


20 pieces each, and the child was given one minute 
in which to assemble it. Data from a pilot study 
revealed that these puzzles could not be assembled 
in one minute by children of the ages used in this 
study. After the subject was stopped on the first 
puzzle, the examiner said, “You did not finish the 
puzzle. You should have been able to finish it 
before your time was up. Since you didn’t finish 
Il give you another puzzle." Then the second 
puzzle was presented, and when the child was 
stopped again, he was told, ‘‘Well, I see you did 
not finish it before the time was up. You didn’t 
finish this one either." Finally, the third puzzle 
was attempted and again the child was stopped 
prior to completing the puzzle, whereupon he was 
told, You did not do very well on these puzzles. 
You must not be very good at putting puzzles to- 
gether. All the other children I asked to put these 
together did them correctly before the time was 
up." 

Dependent measures. Following the administra- 
tion of success or failure protocols, each subject 
^ was then given two tasks: puzzle task and sticker 
game task (after Turnure & Zigler, 1964). The 
order of these tasks was randomly determined for 
each subject. 

The puzzle task consisted of two puzzles 
(adapted from the horse and elephant from the 
Wechsler Intelligence Scale for Children, WISC, 
Object Assembly). The experimenter explained, 
“Here are some pieces of a puzzle. When you put 
them together, they will make something you 
know. I want you to put them together as quickly 
as you can. While you are putting yours together, 
I will put one together too. But you put yours 
together as fast as you can. Any questions? Okay, 
here’s your puzzle. Begin.” 

As soon as the subject began working, the ex- 
perimenter started the stopwatch. While the sub- 
Ject assembled the first puzzle (which was ran- 
domly determined by the subject), the experi- 
. menter assembled the other puzzle. The experi- 
menter left the completed puzzle in view of the 
Subject for 10 seconds, then disassembled it and 
left the pieces in view for 30 seconds. If the subject 
had not completed work on the first puzzle, the 
experimenter repeated the cycle with the puzzle. 

Two dependent measures were recorded while 
the subject worked on the first puzzle task: (a) 

time required to assemble the puzzle correctly 

and (b) the number of times the subject glanced 
at the experimenter or the experimenter’s puzzle. 

The subject was then given the second puzzle 
the one the experimenter assembled while the 

Subject assembled the first puzzle) and told him, 

ere is another puzzle to put together as quickly 

38 you ean. Do it as fast as you can. Any ques- 

tions?” While the subject assembled the second 
- Puzzle, time and number of glances were again 

Tecorded. 

_ A second set of dependent measures were de- 

Tived from the sticker game task, which followed 

u e protocols of Turnure and Zigler (1964). In 

this task, the examiner opened a box of stickers of 
" Several colors (black, green, red, and yellow) and 
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took a sheet of white paper. The experimenter 
then explained, “TIl make a design out of these 
pieces of sticker paper on this white sheet first, 
then you may make a design.’’ The experimenter 
then made one of three predetermined designs 
with one of the colors of stickers (again randomly 
determined). Upon completion of the design, the 
experimenter named her design, “TIl call my de- 
sign a — — (either a bow-wow dog, a pretty tree 
or a go-go-cart). Now you may make any design 
you wish, with any color paper. After you have 
finished, you may give your design a name which 
I will write at the bottom of your design." This 
procedure was repeated for the other two designs. 

Hence, from the sticker game task the following 
measures were taken on each of the three designs: 
(a) the color of stickers used, (b) the design itself, 
and (c) the name given the design by the subject. 
One point was given for each of these three ele- 
ments if they corresponded to the color, design, 
and name used by the experimenter, making a 
total of nine points possible over the three designs. 

Following the completion of the testing, all 
subjects were debriefed and told that the inter- 
polated success and failure were not a true reflec- 
tion of their performance on the preexperimental 
puzzles. 


RESULTS 


Each of the dependent measures was 
analyzed separately in order to evaluate 
differences due to grade levels, treatments, 
sex, and the interactions thereof. Data for 
the first two measures, sticker game and 
glancing, were subjected to a three-factor 
analysis of variance (Grade X Treatment X 
Sex). The third measure, time spent on the 
puzzles, was subjected to a four-factor 
analysis of variance with repeated measures 
on the fourth factor, puzzles. The .05 level 
of significance was adopted for all statistical 
tests. Summary data for all dependent 
measures are shown in Table 2. 

The hypothesized difference for grade 
level was supported by all three dependent 
measures. Significant differences were found 
for all dependent measures (sticker ‘game, 
F = 4.81, df = 2, p < .05; glancing, F = 
3.96, df = 2, p < .05; time on puzzles, 
F = 6.24, df = 2, p < .05). Post hoc tests 
were performed according to Tukey’s test 
for honestly significant differences (HSD) 
and revealed a significant difference between 
second-grade and sixth-grade subjects on 
both the sticker task (q = 4.36, p < .05) 
and number of glances (q = 3.92, p < .05). 
On the time required to complete the 
puzzles, it was found that sixth-grade sub- 
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TABLE 2 
Means AND STANDARD DEVIATIONS FOR ALL DEPENDENT MEASURES BY GRADE, SEX, AND TREA’ M 
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Time 
Glancing 
Grade/sex Treatment Puzzle 1 Puzzle 2 Total 
M SD M SD M SD M SD 
E success 27.20 | 11.69 | 17.00 | 6.60 | 44.20 | 16.24 | 1.20 | 1.30 | 2.40 
failure 60.40 | 37.78 | 63.20 | 60.87 | 125.60 | 62.42 | 2.60 | 1.52 | 2.60 
Female Success 65.00 | 57.92 | 27.20 | 12.87 | 92.20 | 59.98 | 2.20 | 1.79 | 3.00 
failure 53.60 | 32.05 | 31.60 | 32.83 | 85.20 | 41.09 | 3.00 | 2.55 | 3.20 
4th 
Male success 20.00 | 9.35 | 14.00 | 4.42 | 34.00 | 11.38] .20| .45 | 1,60 
failure 68.40 | 61.65 | 62.20 | 83.66 | 130.60 | 96.78 | 1.40 | 2.07 | 2.00 
Female Success 68.00 | 51.43 | 47.60 | 41.79 | 115.60 | 75.99 | 1.80 | 1.79 | 1.80 
failure 44.40 | 59.37 | 46.00 | 45.74 | 90.40 | 65.49 | 1.80 | 2.95 | 3.00 
6th 
Male success 22.60 | 20.01 | 9.60 | 2.40] 32.20 | 20.98 .40 | .55 | 1.40 | .89 
failure 37.20 | 30.61 | 11.00 | 4.53 | 48.20 | 35.11 | 1.60 | .89 | 1.40 | .89 
Female success | 25.40 | 17.29 | 13.80 | 6.30 | 39.20 | 22.06 | .20| .45 | 2.00 1 
failure 27.40 | 15.66 | 14.40 | 6.80 | 41.80 | 21.00 | 1.00 | 1.22 | 1.60 | 1.82 


jects took significantly less time to complete 
the puzzles than both second-grade and 
fourth-grade subjects. In short, the develop- 
mental hypothesis, that as children get 
older they become less reliant on external 
cues, was supported by the results for all 
three dependent measures. 

The hypothesized effect of interpolated 
failure resulting in more nontask orientation 
received partial support. On the sticker task, 
the effect of treatments failed to reach 
statistical significance, However, on both 
dependent measures derived from the puzzle 
task (glances and time), a significant effect 
for treatments was found (glancing, F = 
443, dí = 1, p < 05; time, F = 4.21, 
df = 1, p < .05). In both instances, inter- 
polated failure resulted in subjects glancing 
more frequently and taking more time to 
complete the puzzles than was the case 
for subjects performing under the inter- 
polated success condition. 

The third hypothesis, that girls would be 
more outerdirected than boys, failed to 
receive support from any of the dependent 
measures. However, on one dependent 
measure (time to complete tasks), a signifi- 
cant interaction between treatment and 
sex (F = 7.84, df =1,p < .05) was found. 
Means and standard deviations by cells are 
shown in Table 3 for this interaction. Post 


hoc tests revealed that males took signifi- 
cantly longer to complete puzzles under the 
failure condition than under success condi- 
tions (t = 3.43, p < .05), and females took 
significantly longer than males under the 
success condition (t = 2.44, p < .05). 4 

In order to test for a relationship among 
dependent measures, a chi-square test was 
run. The decision to use a nonparametri¢ 
test was based on the fact that the scores 
derived from both glancing and the sticker 
game were ordinal at best and the distribu- 
tions were not continuous. Since all three 
dependent measures used in this study have 
been used either singly or in combination in 
previous studies, it was judged important to 


TABLE 3 
Means AND STANDARD DEVIATIONS BY CELLS FOR 
TREATMENT X SEX INTERACTION ON 
Time ro COMPLETE PUZZLES 


Treatment 
Sex —————————ÉE 
Success Failure 
Males 
M 36.80 100.80 
SD 16.37 74.48 
Females 
M 78.66 65.80 
SD 64.51 47.52. 
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TABLE 4 
Cent FREQUENCIES FOR THE RELATIONSHIP 
BETWEEN SCORES ON THE THREE 
DEPENDENT MEASURES 


Glances on 


ic Sticker 
Puzzle 1 ai 


score 
Measure 


Time on Puzzle 2 


0-13 seconds 19 3 17 5 
14-24 seconds 14 7 11 10 
25+ seconds 8 9 7 10 

Glancing behavior 
0-1 26 15 
2-5 9 10 


' ascertain whether they were tapping related 
processes. In Table 4 are the cell frequency 
counts upon which the chi-square was com- 
puted on time required to complete Puzzle 2 
and glancing behavior, time required to 
complete Puzzle 2 and sticker game scores, 
and glancing behavior and sticker game 
scores. It should be rioted that time on 
Puzzle 2 was used as that score is the one 
presumably affected by the number of 
glances (i.e., more glancing at the second 
task while assembling the first puzzle should 
increase the speed with which the second 
puzzle is assembled) during the assembly of 
the first. puzzle, Results revealed a reliable 
telationship between the time required to 
complete the second puzzle and glancing 
behavior (x? = 6.87, df = 2, p < -05); 
however, visual inspection of the frequencies 
in Table 4 reveal that the relationship which 
was found was the opposite of the one pre- 
dicted. Namely, subjects who rapidly com- 
pleted Puzzle 2 tended to be those who 
glanced very infrequently while assembling 
the first puzzle. The other analyses failed 
to uncover a statistically significant rela- 
tionship between the other dependent meas- 
, ures (time on second puzzle vs. sticker 
game, x? = 5.65, df = 2; 10 < p > 05; 
sticker game vs. glancing, x! = -79 df = 1, 
^). The relationship between time on the 
Second puzzle and sticker game score, while 
not reaching the adopted level of signifi- 
cance, was again not in the predicted 
direction. Subjects who were most imitative 
on the sticker game (i.e., were outerdirected) 
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were not inclined to complete the second 
puzzle very rapidly. 

A follow-up questionnaire provided some 
additional data of interest. When asked if 
the puzzles were easy or hard, the vast 
majority of subjects who performed under 
the success condition reported that they 
were easy and indicated they would like to 
“play the game again.” However, those who 
performed under the failure condition 
attributed their failure to complete the 
preexperimental puzzles to either character- 
istics of the puzzles (e.g., they had too many 
pieces) or their own inadequacies (e.g., I'm 
not too good at puzzles). Only 3 of the 15 
males under failure conditions blamed the 
puzzles for the incompletion (all 3 were 
second-grade children). The responses of 
females were approximately evenly divided 
at all three grade levels. 


DISCUSSION 


The present study extended the outer- 
directed construct to normal-achieving chil- 
dren. In so doing, an attempt was made pri- 
marily to test several assumptions under- 
lying the previous work with mentally 
retarded subjects. Specifically, the notion 
that outerdirectedness is a function of 
failure was tested by means of interpolating 
success and failure prior to performance on 
criterion tasks. : 

In general, the findings of the present 
study were supportive of the developmental 
trend reported elsewhere (MacMillan & 
Cauffiel, 1973; Yando & Zigler, 1971; 
Zigler & Yando, 1972); namely, that as 
children get older they become less outer- 
directed. This finding seems to be a reliable 
one as Yando and Zigler (1971) used differ- 
ent populations (institutionalized and non- 
institutionalized retardates classified as 
familial and organic) as did Zigler and 
Yando (1972) whose subjects were institu- 
tionalized and noninstitutionalized retarded 
and normal. The same trend was reported 
by MacMillan and Cauffiel (1973) with 
educationally handicapped and normal chil- 
dren. In the present study, the decrease in 
outerdirected behavior was evidenced on 
both the sticker game task and the glanding 
measures, Whereas time spent on puzzles 
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failed to reveal this gradual progression 
toward less outerdirectedness. In light of 
the similar findings by MaeMillan and 
Cauffiel (1973), Yando and Zigler (1971), 
and Zigler and Yando (1972) with the 
Sticker game task, it would appear that this 
task is sufficiently difficult and/or ambiguous 
to tap outerdirectedness, particularly since 
the cognitive demands of this task are 
minimal. 

The findings of the present study, how- 
ever, raise some doubt as to the validity of 
the interpretation of greater speed on the 
Second puzzle as indicative of a shift from 
outerdirectedness to innerdirectedness. First, 
the nature of the relationship between 
glancing behavior and time required to 
complete the second puzzle fails to confirm 
the notion that the glancing behavior is 
beneficial. One might also argue that the 
increased speed of puzzle assembly is simply 
a function of increased “power” as MA 
and/or CA increased. 

The hypothesized sex difference was not 
supported by the results of this study. No 
main effect for sex was found on any of the 
three dependent measures. One finding of 
interest, despite the fact that it had not 
been predicted, was the interaction of 
Treatment X Sex on the puzzle-time meas- 
ure. On this task, girls were more outer- 
directed under success conditions than were 
boys, and conversely, boys were significantly 
more outerdirected under the failure condi- 
tion. This greater outerdirectedness by boys 
in the failure condition is consistent with 
previous findings (Achenbach & Zigler, 
1968; Turnure, 1970). 

Findings related to success and failure 
treatments were the most intriguing. First, 
the hypothesized effect of failure increasing 
outerdirectedness was supported by the 
results for glancing and puzzle time but not 
for the sticker game task. It must be noted 
that the preexperimental tasks with which 
failure was induced were puzzles, and the 
two dependent measures for which failure 
altered performance were both derived 
from a similar puzzle task (i.e., glancing and 
time). On the sticker game task, the effect 
of failure failed to reach significant propor- 
tions. This finding underscores the point 
made by Butterfield and Zigler (1965) that 
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different degrees of similarity between tasks 
on which success and failure and criterion 
tasks may account for inconsistent findings 
between studies which experimentally induce 
success and failure. A similar transference 
was noted by MacMillan and Cauffiel (1973) 
and by Turnure and Zigler (1964). In light 
of the inconclusive findings with regard to 
the effect of experimentally induced failure 
on performance (MacMillan & Cauffiel, 
1973; Steigman & Stevenson, 1960; Steven- 
son & Pirojnikoff, 1958), greater attention 
need be given to differing methods of in- 
ducing failure, transference effects to vari- 
ous dependent tasks, differential percep- 
tions of the reason for failure by populations 
differing in their histories of failure (Mac- 
Millan & Keogh, 1971), and the possibility 
that a given failure experience may have 
different potency for populations differing 
in their histories of failure (MacMillan & 
Cauffiel, 1973). 

The failure to find the predicted rela- 
tionships between dependent measures raises 
doubts concerning whether they are tapping 
related processes. The statistically signifi- 
cant relationship found between glances on 
Puzzle 1 and the time required to complete 
Puzzle 2 is of even more concern as it appears 
to indicate that subjects who glance fre- 
quently during Puzzle 1 are not glancing 
at Puzzle 2 but at stimuli that are either of 
no value for the immediate task or interfere 
with the assembly of Puzzle 2. In fact, the 
present results seem to be more reasonably 
interpreted as supportive of distractibility 
than of outerdirectedness. The very tenta- 
tive finding on the relationship between 
time on Puzzle 2 and sticker game score 
Seems to indicate that less imitative sub- 
jects were faster on Puzzle 2 and the more 
imitative were slower. Coupling these find- 
ings with the failure to find any significant 
relationships between these same dependent 
measures with underachieving males (Mac- 
Millan & Cauffiel, 1973) questions the 
validity of these tasks for assessing outer- 
directedness. 

The follow-up interview used in this study 
revealed that a majority of boys who were 
not allowed to complete the puzzles in the 
preexperimental phase of the study (per 
forming under the failure condition) attrib- 


, 


| uted their failure to complete the puzzle to 
characteristics inherent in the puzzle rather 
than their own inabilities. The three boys 
who blamed themselves for failing to com- 
plete the puzzles were second graders, which 
js the opposite of the predicted direction of 
the relationship between internalizing blame 
and outerdirectedness. 

In summary, the findings of the present 
study highlight the need for further research 
on three dimensions: the task specific nature 
of outerdirectedness, the effects of failure as 
a result of the precise manner in which fail- 
ure is induced, and the validity of the de- 
pendent measures used to assess outer- 
directedness. 
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TRAINING LETTER DISCRIMINATION BY PRESENTATION 
HIGH-CONFUSION VERSUS LOW-CONFUSION ALTERNATI 


ROSEMERY O. NELSON?’ ano KENNETH S. WEIN 


University of North Carolina at Greensboro 


Two groups of preschool children (n = 8 per group) were taught letter 
discriminations by means of a matching-to-sample task. The high- 
confusion group were presented with letter alternatives that were 
highly confusable, based on E. J. Gibson's reading research. The low- 
confusion group were presented with letter alternatives that were less 
likely to be confused. The performance on a posttest of letter dis- 
crimination of both training groups was superior to a no-treatment 
control group. While the high-confusion group required more training 
trials to criterion, its posttest performance was better than the low- 
confusion group (p < .06). These results were discussed in terms of 
a high-confusion training procedure producing better discriminations 
among distinctive features of letters. 


Reading as an academic behavior has 
been considered theoretically within the 
framework of operant psychology by Bloom 
(1973), Goldiamond and Dyrud (1966), 
Skinner (1957), and Staats (1968). Al- 
though Gibson’s theory of reading is usually 
placed within a cognitive framework (Wil- 
liams, 1973), her theory is nonetheless 
amenable to experimentation employing an 
operant model of stimulus-response conse- 
quence. The present study is concerned with 
varying antecedent conditions to ascertain 
which were most likely to produce desirable 
reading responses. 

Gibson’s analysis of the reading process 
(Gibson, 1969) consisted of four phases: 
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learning receptive and expressive langu 
skills; learning to differentiate graphic s 
bols (i.e., letters) ; learning to decode let 
into sounds (i.e., learning both variable 
constant letter-sound relationship); 

using progressively higher-order units 
structure, including pronounceability, 
semantic and syntactic constraints. Gibson 
second phase, that of learning letter di 
crimination, is of primary interest in 
study. Gibson held that letter discrimi 
tion is facilitated by recognition of the eriti- 
cal or distinctive features of letters. Hert 
search (Gibson, Gibson, Pick, & Os 
1962; Gibson, Osser, Schiff, & Smith, 1968 
produced a set of stimulus materials, includ 
ing both real letters and artificial letterlik 
forms (which can be categorized as highly 
confusable versus infrequently confusable)- 
and also including a set of graphemic fea- 
tures that can be categorized as distinctive 
versus nondistinctive. 

Four studies, using this set of stimul 
materials, have concluded that the learning ~ 
of distinctive features greatly facilital 
letter discrimination. Pick (1965), using 
Gibson letterlike forms, taught three growl 
of kindergarten children to distingu 
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comparable transformations was superior to 
transfer tasks involving both different stan- 
dards and different transformations, or the 
same standards and different transforma- 
tions. Williams (1969), also using the Gib- 
son letterlike forms with kindergarten chil- 
dren, employed three training methods dur- 
ing delayed matching to sample: discrimi- 
nation training in which different forms 
were offered as alternative matches to the 
standard; discrimination training in which 
different transformations of the same form 
were offered as alternative matches to the 
standard form; and reproduction training in 
which the standard form was traced and 
copied. The middle group made the most 
errors during training, but performed sig- 
nificantly better than either of the other 
two groups on a transfer task also involving 
discrimination of letterlike forms. Tawney 
(1972) found that reinforcing four-year-old 
children to respond to critical dimensions of 
letterlike forms produced better perform- 
ance on a matching-to-sample task involv- 
ing real letters than reinforcing responses to 
noneritical dimensions, Samuels (1973) 
found that letter discrimination training 
that forced attention to distinctive features 
of letters facilitated learning letter names 
more than discrimination training not in- 
volving distinctive features or mere exposure 
to letters. 

The present study differed from the first 
three summarized above in that real letters 
were used in both the training and transfer 
(posttest) tasks. The advantage of using 
real letters is that since these are the stimu- 
lus materials found in typical classrooms, 
the results of the study can be more readily 
generalized to the classroom. The hypothe- 
sis being tested, derived from the studies 
above, was that discrimination training in- 
volving highly confusable letters would pro- 
duce better results on a transfer task (post- 
test) than discrimination training involving 
less confusable letters, because the former 
training would require greater attention to 
the distinctive features of the letters. The 
experimental design consisted of three 
groups of children: an experimental group 
taught to discriminate among low-confusion 
alternatives; an experimental group taught 
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to discriminate among high-confusion alter- 
natives; and a control group who were pre- 
and posttested with no additional training. 


MzrHoD 


Subjects 


Fifty-three children, ages 2⁄2 to 4⁄2 years, who 
attended four local day care centers, completed a 
pretest of letter discrimination. Several other chil- 
dren were unable to complete pretesting due to 
crying or being unable to match colors during the 
familiarization slides. Of the 53 children, those 
who made at least 13 errors on the pretest consist- 
ing of 58 items were divided into three matched 
groups. Matching was done on the basis of age, 
sex, number of pretest correct responses, and the 
day care center which the child attended. For the 
high-confusion experimental group (N = 8: 5 fe- 
males and 3 males), the mean age was 45.5 months, 
and the mean number of pretest correct responses 
was 40.125. For the low-confusion experimental 
group (N = 8: 5 females and 3 males), the mean 
age was 44.6 months, and the mean number of 
pretest correct responses was 40.500. For the con- 
trol group (N = 9: 5 females and 4 males), the 
mean age was 43.3 months, and the mean number 
of pretest correct responses was 38.007. Two chil- 
dren were unable to complete the study: one child 
in the high-confusion experimental group cried 
during training, and one child in the control group 
became sick; these two children were replaced by 
two others so as to complete the subject pool de- 


scribed above. 


Materials 


Testing slides. The same set of slides was used 


i and posttesting. There were 15 “famil- 
ation um could either 


be matched ; l 
the first three slides had only two alternatives, the 
next four slides had three alternatives, the next 
four slides had four alternatives, and the final four 
slides had five alternatives. The actual pre- and 
posttest consisted of 58 slides containing two du- 
plicate sets of 29 slides, in order to assess more 
reliably the child's skill in matching to sample by 
using letter stimuli. The slides were photographs of 
Roman capital letters. Each pre- and posttest letter 
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six letters (A, F, J, P, S, and Z) did not appear as 
stimuli to be matched because they were not con- 
fused at least four times with two alternatives. On 
the other hand, if a letter was confused at least 
four times with three or four other letters, it be- 
came the stimulus in two different slides. A letter 
was judged to be of low confusion with another 
letter if both were confused no times in Gibson's 
Confusion Matrix I. Since there were many alter- 
native low-confusion choices for any stimulus 
letter, a second criterion that had the least num- 
ber of common distinctive features was employed, 
as determined by Gibson’s chart of distinctive 
features for letters (Gibson, 1969, p. 88). As an 
example, M is highly confused with W and N. ,and 
rarely confused with L and Q. 

Training slides. For each training group (high 
confusion and low confusion), there was also a set 
of 29 training slides using matching to sample 
with: Roman capital letters, Each training slide 
consisted of the letter to be matched and three 
alternative choices in a horizontal row beneath it. 
For the high-confusion training group, the three 
alternatives (in random order from slide to slide) 
were the correct response and two high-confusion 
alternatives, For the low-confusion training group, 
the three alternatives (in random sequence from 
slide to slide) were the correct response and two 
low-confusion alternatives, Six color matching 
slides were also used as "warm-up" slides prior to 
each training session for both groups. 

Apparatus. All slides were projected onto a wall 
by a Kodak carousel Projector located a constant 
distance from the wall, Correct responses were fol- 
lowed by poker chips, which were used as tokens. 
The child deposited poker chips into a jar; all 
chips in the jar were exchanged at the end of the 
session for pieces of chocolate candy, 


Procedure 


iarization slides, a correction procedure was em- 
ployed. If the child made an error, 


tinued from. the study at this point. The 58 pretest 
and five alter- 


Training. Of the 58 children who completed pre- 


n 13 or more errors on the 
58 slides were matched into one of three groups; 


high-confusion training, low-confusion training, or 
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a control group who merely continued 
tend the day care centers and was pre- 
tested without additional training. For low. 
high-confusion training, each training sessi 
conducted on an individual basis in the same 
and using the same apparatus as during 
ing; the trainers were two female graduate 
who trained the children on alternate days, 
both low- and high-confusion training, à 
ing session began with six warm-up slides im 
matching to sample using colors, For the 
training day, these warm-up slides were also 
to teach the value of the tokens by having 
child exchange tokens for candy at the 
these warm-up slides. Each training session 
sisted of completing the 29 slides approp. 
that training group. If the child made a 
Tesponse, he or she was given a token and 
reinforcement, and the next slide appeared. 
child made an incorrect response, the experi 
said "no," and the slide projector was turni 
for five seconds (time out); the same slid 
reappeared. This procedure was followed until 
child made the correct response for that slide. 
trainer recorded all incorrect responses on 
coded data sheets. Each session ended wh 
child had correctly matched the letters on 3i 
the 29 training slides. The child was then 

to exchange tokens for chocolate candy. 
quence of slides for both the high- and lo 


end 


Postlesting. Posttesting was conducted wi 
three days of the time when the child comple 
the last training session. For every two experim 
tal children who completed training and postt 
ing, one control child was also posttested. T 
same slides were used during posttesting as duri 
pretesting; these slides consisted of familiarizati 
slides and 58 slides consisting of a stimulus li 
and five alternative choices from which to sex 
the correct match. The same male examiner 
conducted pretesting carried out the posttestin 
he remained “blind” as to which group the 
had been assigned (high confusion, low confusi 
or control), 


REsvurTS 


Trials to Criterion 


The criterion for completion of training 
for each child in the high-confusion and low- 
confusion experimental groups was two COD- | 
seeutive days with no errors on the set of P 
training slides appropriate for each trainin 
group. The mean trials to criterion (nU 
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TABLE 1 
MEAN NUMBER or Correct Responses, Errors, AND TRIALS TO CRITERION 
Pretest Posttest ‘Trials to criterion 
Group | N T 
HC 
bosco csi e E uen sp 
HC 8 40.125 | 3.68 | 17.250 | .625 | 54.000 | 3.78 | 3.750] .250| 6.625 4.4. 
LC 8 40.500 | 8.18 | 15.875 | 1.625 | 49.125 | 5.25 | 8.750] .125 2.150 Hr 
Cc 9 38.667 | 4.56 | 18.778 | .617 | 41.778 | 7.47 | 15.111 | 1.111 | not not 
trained| trained 
Note. HC = high confusion, LC = low confusion, C = control. 
^ Maximum score = 58. 
ber of training days required) for the high- (p < .06) on the posttest of letter discrimi- 


confusion group was 6.625; for the low-con- 
fusion group, 2.750. A ¢ test was performed 
on these mean differences, which revealed 
that significantly more training days (t = 
2.357, df = 14, p < .05) were required for 
the high-confusion group to meet criterion 
than for the low-confusion group. 


Pretest-to-Posttest Difference Scores 


All subjects were given the same pre- and 
posttest of letter discrimination consisting of 
58 matching-to-sample slides, each with one 
stimulus letter and five alternative letters 
(the match, two high-confusion letters, and 
two low-confusion letters). All groups im- 
proved from pre- to posttest. The mean pre- 
test, posttest, and difference scores for all 
three groups are summarized in Table 1. A 
one-way analysis of variance revealed sig- 
nificant differences among the pretest-to- 
posttest difference scores for the three groups 
(F = 8.99, df = 2.22, p < .01). A Newman- 
Keuls test for multiple comparisons among 
means demonstrated that the high-confusion 
group was different from the control group 
at the .01 level of significance; that the low- 
confusion group was different from the con- 
trol group at the .05 level; and that the 
high-confusion group was different from the 
low-confusion group approaching the .05 
level of significance (p < .06). Thus both 
training groups performed significantly 
better on the posttest (as compared with 
their pretest performances) than the no- 
treatment control group. While the high- 
confusion group required significantly more 
training trials than the low-confusion group, 
their performance was significantly better 


nation than the low-confusion group. 


Error Analysis 


An examination of Table 1 reveals that 
for all three groups, the most frequent type 
of error made on the pretest was errors be- 
tween high-confusion letters. The criterion 
for a letter to be judged highly confusable 
with another letter, it will be recalled, is 
four or more errors in confusion between 
them in Gibson’s Confusion Matrix I. Since 
during training, the low-confusion group was 
discriminating only among low-confusion 
alternatives, and the high-confusion group 
was discriminating only among high-con- 
fusion alternatives, an analysis of the types 
of error made during the posttest for these 
two training groups was undertaken. The 
means reported in Table 1 indicate that re- 
gardless of type of training procedure, more 
than 90% of posttest errors for all three 
groups were confusions between high-con- 
fusion letters. 


Confusion Matrix 


In order to corroborate Gibson’s Confu- 
sion Matrix I (Gibson et al., 1963), the pre- 
test errors of all 53 children who completed 
pretesting were analyzed. Given the con- 
struction of the pretest, unlike Gibson’s 
matrix, each letter did not have an equal 
opportunity to be confused with every other 
letter, Hence the number of opportunities 
for such confusions was tabulated for each 
letter-pair combination. A percentage mea- 
sure was taken by dividing the actual num- 
ber of confusions for each letter pair by the 
number of opportunities for such confusions. 
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The pairs of data in 77 corresponding cells 
produced a Pearson correlation of .783 be- 
tween the confusion matrix derived from the 
pretest data of the present study and Gib- 
son's confusion matrix. 


DiscussioN 


This study indicated that training in 
letter discrimination resulted in better post- 
test performance than lack of such train- 
ing. While subjects who were trained by 
making discriminations among high-confu- 
sion letters required more trials to reach 
criterion, their posttest performance was 
better (p « .06) than subjects who were 
trained by making discriminations among 
low-confusion letters. 

These results corroborate those of Pick 
(1965), Williams (1969), Tawney (1972), 
and Samuels (1973), each of whom con- 
cluded that letter (letterlike) discrimination 
was enhanced by teaching children to at- 
tend to the critical features of the stimuli. 
Of these studies, Williams’ and Samuels’ are 
most similar to the present investigation; 
both studies indicate that although training 
with frequently confused graphic stimuli re- 
sults in slower acquisition, this training pro- 
duces better performance on a transfer task 
than training with infrequently confused 
stimuli. 

The fact that discrimination learning re- 
quires differential training on two or more 
stimuli is well accepted (eg., Lashley & 
Wade, 1946). In animal operant research, it 
has been concluded that the difficulty of the 
discrimination is inversely related to: the 
difference between thé two training stimuli. 
The generalization gradient, however, is 
steeper if the two stimuli that are used in 
training are closely related on a stimulus di- 
mension (Terrace, 1966). The teaching of 
letter discrimination by differential training 
using high-confusion alternatives would be 
predicted to produce many training trials 
but a better performance on further tests in- 
volving letter discrimination; these were in- 
deed the obtained results, 

Similar results have been obtained from 
discrimination training involving words as 
another type of verbal stimuli. Using lists of 
words, Levin and Watson (1963), Mc- 
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Cutcheon and McDowell (1969), Otto and 
Pizzillo (1970), and Samuels and Jeffrey 
(1966) all conclude that high intralist simi- 
larity impedes rate of learning of the origi- 
nal list, but produces greater accuracy on a 
transfer list of words. 

Guralnick (1972) reviewed the research 
available on letter discrimination and con- 
cluded that the educational implication of 
this research was that pretraining on the 
distinctive features of letters would facili- 
tate reading acquisition. The present study | 
provides a specific suggestion to the class- 
room teacher: Letter discrimination learn- 
ing is facilitated by training the distinctions | 
between two similar letters. While using two 
dissimilar letters may result in faster ac- 
quisition, this is what Samuels and Jeffrey 
(1966) labeled “false economy,” since poorer 
results are produced in transfer situations. 

One of the questions that remains to be 
answered is whether providing the low-con- 
fusion group with overlearning trials—so | 
that their total number of discrimination 
learning trials is yoked to that of the high- 
confusion group—would change the present 
results. Although the present design was | 
analogous to classroom stiuations in which 
teaching ceases with mastery, some possibil- 
ity remains that the superior posttest per- 
formance of the high-confusion group was 
not due to the nature of the training mate- 
rials but rather simply to the greater num- | 
ber of training trials that this group under- 
went. 

In addition to demonstrating the posttest 
superiority resulting from high-confusion 
training, the second main conclusion from | 
the present study was a corroboration 0} | 
Gibson's Confusion Matrix I (Gibson et al; 
1963), since the letter-confusion errors a 
the present study correlate with Gibsons 
Confusion Matrix I with r equal to .783. 
Gibson et al. (1962) had found a correlation 
of .87 between confusion errors made wit 
letterlike forms and confusion errors made 
with real letters. Since these letterlike forms | 
were constructed from Gibson's theoretical 
chart of distinctive features of graphemes 
(letters), the present study indirectly 
through this series of correlations suppor! 
the use of Gibson's stimulus materials ™ 


TRAINING LETTER DISCRIMINATION 


future research as samples of high-confusion 
versus low-confusion letters or letterlike 
forms, or as distinctive versus nondistinetive 
graphemic features. 
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MULTITRAIT-MULTIMETHOD ANALYSIS 
OF CONCEPTUAL TEMPO 


VERNON C. HALL’ ann WILLIAM J. C. RUSSELL 


Syracuse University 


The present study used the multitrait-multimethod matrix to de- 
termine the convergent and divergent validity of conceptual tempo. 
Two measures of conceptual tempo (Matching Familiar Figures Test, 
MFF, and Word Recognition Test, WRT) and two measures of intel- 
lectual functioning (Raven Coloured Progressive Matrices, RCPM, 
and Peabody Picture Vocabulary Test, PPVT) were presented to 82 
third-grade boys twice (to acquire reliabilities), It was found that the 
latencies on all instruments correlated with latencies on all other in- 
struments (no divergent validity) and that error scores on the MFF 
correlated better with error scores on the RCPM than the WRT. 


In 1964 Kagan, Rosman, Day, Albert, and 
Phillips reported a series of experiments on 
information processing. While investigating 
conceptual style, that is, whether a child 
associates objects or pictures of objects 
analytically or relationally, Kagan noted 
that children who reported more analytic 
concepts had a tendency to delay their re- 
sponse longer than children who reported 
more relational concepts. This led to the 
hypothesis of a more fundamental construct, 
which was responsible for reflective chil- 
dren producing more relational responses. 
Confirmation for this hypothesis was found 
in the positive relationship between the 
number of correct solutions on the first 
response and the amount of latency to the 
first response on a Design Recall Test 
(DRT) constructed by Kagan. However, it 
was felt that the DRT might contain a 
memory factor that was a part of verbal 
intelligence, Because Kagan felt measures 
of this hypothesized construct were orthog- 
onal to intelligence, he constructed the 
Matching Familiar Figures Test (MFF) in 
an effort to provide a more appropriate 
(pure) measure of the construct. This MFF 
(briefly described later) has become the 
most used measure of the construct Kagan 
has labeled “conceptual tempo.” 


* Requests for reprints should be sent to Vernon 
C, Hall, Syracuse University, 331 Huntington Hall 
Syracuse, New York 13210, j 


932 


Based on the hypothesis that time 
reflection and analysis leads to fewer e 
and that fast responding leads to 
errors in a standard variant task req 
response uncertainty, significant negati 
correlations between latencies and error 
on the MFF should be expected. In 14 stud 
les reviewed by the present authors th 
were 37 negative correlations significant! 
the .01 level, 4 significant at the .05 
and 5 failed to reach significance. Oi 
one nonsignificant positive correlation Y 
found. à 

Using a variety of perceptual recogni 
tasks, Kagan et al. (1964) found conce t 
tempo (latency) to be stable over 
"The reliability of response times over & 
nine-week period was in the high 70s a 
there was remarkable generality across 
as varied as DRT, MFF and HVM 
32].” 

Thus with relatively high negative co 
lations between errors and latencies on 


the MFF and similar perceptual tas 
involving high response uncertainty 
simultaneous presentation of alternative hy- 
potheses, and adequate reliability on 
Tetest correlations, the MFF has been 
cepted as an appropriate tool for classifying 
the response patterns of lower element? 
school subjects (Grades 1-4) using the con- 


CONCEPTUAL TEMPO 


ceptual tempo construet as either impulsive 
or reflective. 

Subsequently, Kagan hasMleyised a classi- 
fication system, employing the two measures 
(latency to first response and number of 
errors), that uses a double median split. 
Those subjects scoring below the median 
on errors and above the median on latency 
are classified as reflective; that is, they take 
longer to respond, but their first response is 
usually correct. Those above the median on 
errors and below the median on latency are 
classified as impulsive; that is, they make 
fast initial response, but that response is 
often incorrect. This classification has typ- 
ically resulted in categorizing approximately 
70% of the sample as either reflective or 
impulsive with the remaining 30% distrib- 
uted about equally between the fast-correct 
and slow-wrong groups. 

In most of the subsequent research, con- 
ceptual tempo has been widely accepted and 
studied as an important new dimension of 
intellectual development independent from 
intelligence. It is obvious that if this is in- 
deed a new trait, it is an important one for 
education. Teachers, unaware of this native 
response tendency would tend to make 
school a relatively unhappy place for chil- 
dren by unnecessarily and inappropriately 
hurrying the reflectives or slowing the im- 
pulsives, 

It is important in psychology, however, 
to examine carefully the possibility that any 
new trait is indeed new rather than simply 
being a different term for something already 
known. Campbell and Fiske (1959) recog- 
nized this danger and pointed out that mea- 
sures of individual differences have typi- 
cally been validated by the means of con- 
vergence; that is, a significant correlation 
between a new measure of a concept and an 
independent and accepted measure of the 
same concept has been a common procedure 
for validating a new construct (e.g., MFF 
and DRT). They go on to suggest that a 
new construct should also demonstrate dis- 
criminant validity. That is, a test measur- 
ing a construct can be invalidated by too 
high a correlation with another test from 
which it is intended to differ (because it 
measures a different construct) . 
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In the present case the authors were con- 
cerned that it be demonstrated more con- 
clusively that the conceptual tempo trait 
has discriminant validity with the trait of 
intelligence. A review of the literature led 
to the conclusion that the evidence for dis- 
criminant validity is less than conclusive. 
From observing Table 1, it can be determined 
that of the 42 correlations reported between 
MFF errors and measures of intellectual 
functioning in 11 studies, 21 were signifi- 
cant. Fourteen of 42 correlations between 
latencies on the MFF and measures of in- 
tellectual functioning were significant. 

In order to examine both the convergent 
and discriminant validity of a trait, Camp- 
bell and Fiske (1959) suggested that more 
than one trait be measured and that more 
than one method be used to measure each 
trait. Results can then be plotted in a multi- 
trait-method matrix and examined for re- 
liability (the same trait by test-retest), 
convergent validity (the same trait mea- 
sured by different methods), and discrim- 
inant validity (different traits measured by 
the same method). In the latter case, rela- 
tively low correlations due to common 
method variance would be expected. 

The purpose of the present study was to 
employ the multitrait-multimethod matrix 
to determine whether the conceptual tempo 
individual difference construct is (a) stable 
over time, (b) stable across instruments, 
and (c) orthogonal with measures of in- 
tellectual functioning. 


METHOD 


Subjects 


Subjects for this study were 85 white, third- 
grade boys from six classrooms in two elementary 
schools in a middle-class suburban school district 
(3 subjects were dropped because they were absent 
on their scheduled date for retest; thus the anal- 
yses contained 82 subjects). The mean age of 
the subjects was 8 years, 10 months, with a stan- 
dard deviation of 6.5 months. : P 

Boys were selected because most studies using 
both sexes have found significant sex differences 
in responses on the MFF. Third-graders were used 
because Kagan et al. (1964) developed the MFF 
while studying children in Grades 1 through 4. 
Achenbach (1970) suggested that the MFF de- 
creases in validity with fifth- and sixth-graders. 
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TABLE 1 


H 
CORRELATION BETWEEN TIME AND ERROR Scores oN THE MarcHinG FamiLiar FIGURES | 
TEST AND SCORES ON OTHER MEASURES OF INTELLIGENCE AND ACHIEVEMENT 


Errors—IQ Latency—IQ 
Study Grade Instrument - 
Male Female Male Female 
Kagan, Rosman, Day, 
Albert, & Phillips, | —.53** —.28 .36 .22 3 IQ based on W. 
1964 Verbal scale IQ. 
—.7* -2 .05 15 3 IQ based on CTMN 
.10 —.40* | —.13 —.08 4 IQ based on W. 
Verbal scale IQ 
Kagan, 1965a —.26 —.32 27 .95** 1 IQ based on W: 
Verbal scale IQ 
—.25 —.20 .00 .46** 2 IQ based on 
Verbal scale IQ 
—.86** — —.90 .86** 19 3 IQ based on W 
Verbal seale 
—.88* —.02 12 —.16 3 IQ based on 
Verbal scale T 
Kagan, 1966a —.29* = —.20 .08 .25* 1 IQ based on 
Verbal scale IQ 
Kagan, 1966b —.53** —.28 .96** .22 2,3 IQ based on 
Verbal scale IQ 
.10 —.40* | —.13 —.03 3,4 Same subjects 
tested 
Lewis, Rausch, Gold- 
berg, & Dodd, 1968 | —.30 —.60** | .30 .45* | Preschool | IQ based on Stanfe 
Binet 1 
-40 — —.67#* 
Yando & Kagan, 1968 | —.36** — —.33** .20 127% 1 Metropolitan R 
ing Readiness ' 
— 238 —.48"" | 16 .40** 1 Same subjects re- 
P > tested b 
Ward, 1968 ze io .23 .23  |Kindergarten| IQ based on PP 
—-, -. m ** 1 
Matélieriaüm de 15 .39** |Kindergarten| IQ based on PP 
Goodman, 1969 .82** .39* i 
Eeka d Black, 1971 | are age | age D age [egre TO med on QURE 
Harrison & Nadelman, Hur 4 
1972 —.80 aW og 00 | Preschool | 1Q based on PPVT 
_ Lindstrom, 1972 .08 .02 2 WISC, verbal 
IQ 
20 —.08 2 WISC, perform 
scale IQ 
ERa -00 2 WISC, full scale 
: — .33%* 2 Raven Colored. 
gressive Matric 
(errors) 


Note. WISC = Wechsler Intelligence Scale for Children iforni 
M VT 5 ; CTMM - Calif Test of Menti 
E pa = Peabody Picture Vocabulary Test; PMA = primary mental abiit 


“p< 01. 


Thus, third-graders should be experienced enough 

in the school setting to feel secure in the testing 

BI and well within the validity range of the 
FF. 


Instruments. 


Four instruments were employed in this 8 
two designed to measure conceptual tempo and 


CONCEPTUAL TEMPO 


designed to measure intellectual functioning. All 
shared the same format (multiple choice) and 
yielded the same scores (latency, number correct, 
and errors). 

Measures of conceptual tempo were as follows: 

1. Matching Familiar Figures Test. As men- 
tioned earlier this test has been the most fre- 
quently used instrument to measure conceptual 
tempo. This 12-item test (there are now several 
forms but the original was used), a visual match- 
ing task, requires the subjects to select the one 
stimulus from an array of six variants that is 
identical to a standard. The stimuli consist of 
line drawings of familiar objects. 

2. Word Recognition Test (WRT). This 20-item 
test, which was constructed by the present ex- 
perimenters, was patterned after a test developed 
by Kagan (1965b) to study word-recognition skills. 
He found, “Word-recognition errors were nega- 
tively related to response time on MFF for both 
sexes (coefficients were significant at .01 or beyond 
for the two-tailed test). In addition, high-error 
scores on MFF and HVM each predicted high 
word-error scores (p < .05 or beyond) (Kagan, 
1965b, p. 615]." 

For this test, two of Kagan’s test items (in- 
cluded in the original article) were used as practice 
items. Included in the 20 test items were four 
types of stimuli: (a) Words in which the initial 
phonemes were similar; that is, shone, shore, shove, 
shame, and shave, with shove being the stimulus 
word. (b) Words in which the final phonemes were 
identical ; that is, bang, rang, hang, sang, gang, with 
rang being the stimulus word. (c) Multisyllable 
words with the same initial phoneme; that is, 
quagmire, quadrang, quadrate, quartile, and quad- 
roon, with quadroon being the stimulus word. (d) 
Multisyllable words with the same final phoneme; 
that is, reflective, additive, sensitive, tentative, 
and creative, with sensitive being the stimulus 
word, 

All positions were represented an equal number 
of times for right answers. As an additional attempt 
to ensure response uncertainty, all words were on 
or above the third-grade reading level. For both 
measures of conceptual tempo the ‘items were 
scored for response time to first selection, number 
of errors before finding the correct response, ani 
number of correct first responses. 3 

Measures of intellectual functioning were as 
follows: 1 

1. Ravens Coloured Progresswe Matrices 
(RCPM, 1965). In this 36-item test, the subject is 
asked to select the correct variant from an array 
of six variants that will complete a matrix. Ina 
review of more than 70 studies, Burke (1958) con- 
cluded that while the RCPM was not a substitute 
for the Stanford Binet or Wechsler tests, it “is 
perhaps an almost equally useful supplement an 
shows intercorrelations with such tests perhaps 
as high as they show with one another [p. 222]. 


Burke found reports of studies using nol 
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children in which there were correlations between 
the RCPM and the Wechsler Intelligence Scale 
for Children (WISC) full-scale, verbal, and per- 
formance IQs of 91, 84, and .83, respectively. 

2. Peabody Picture Vocabulary Test. (PPVT). 
In this test the child is asked to select the correct 
picture from an array of four pictures that repre- 
sent the stimulus word pronounced by the ex- 
perimenter. The PPVT was developed as a measure 
of verbal intelligence, and the manual reports a 
number of high correlations with other stan- 
dardized measures of intellectual ability. For both 
measures of intellectual functioning, items were 
scored for response time to first selection, number 
of errors, and number of correct responses. Be- 
cause these two tasks did not require that the 
child continue to guess until the correct answer 
was achieved (except for the first two items on the 
RCPM), the error scores were mirror images of the 
correct scores. 


Procedures 


Subjects were tested individually in an empty 
room. All subjects were eager to participate on 
both the test and retest. To control for order 
effects, each order was administered an equal 
number of times (seven), with the exception of 
four orders that were administered six times, In 
addition, for each subject, the test and retest 
orders were different. 

‘A test-retest design, with a one-week interval 
between test sessions, was used to secure reli- 
abilities for the multimethod-multitrait correla- 
tional analysis. Standardized instructions already 
available were used for all instruments except the 
WRT where the following instructions were used. 

“Tn a minute I will say a word, then I will show 
you a card with five words printed on it. I want 
you to point to the word I said. If you point to 
the wrong one, I will just say ‘no, try again’ until 
you get the right one." 


ANALYSES 


Correlations between methods and traits were 
computed and analyzed for convergent and dis- 
criminant validity following the suggestions. of 
Campbell and Fiske (1959). Three correlation 
matrices were computed; one using results from 
first administration, one using results from the sec- 
ond administration, and one in which each pair 
of correlations was transformed into a mean corre- 
lation, using Fisher's table for r to z transforma- 
ue additional estimate of the stability of the 
MFF as a measure of aede m Vcg 

ii vs, 1965) was calculate: on the dis- 
ee Gek d the diagonal of a fourfold 


ion of subjec | 
Me (eet Table 3). If a subject obtained the same 
classification, he would be classified in one of the 


i his 
3 the diagonal of the fourfold table. TI 
mh was computed for each instrument. 


936 VERNON C. HALL AND WILLIAM J. C. RUSSELL 
TABLE 2 
MEANS AND STANDARD DEVIATIONS FOR ALU INSTRUMENTS 
Error Correct Latency 
nec Test Retest Test Retest Test Retest 
Me 16.35 15.38 3.87 4.44 .51 7.30 
SD 5.02 5.57 1T 1.73 3.95 4.78 
vii 3.41 2.72 17.95 17.89 3.38 2.87 
SD 2.63 3.33 2.01 2.25 1.28 1.56 
M "^ 
E 14.99 12.66 21.57 23.82 4.57 3.76 
SD 4.46 4.68 4.51 4.70 1.41 1.10 
PPVT ) 
M 49.66 48.32 75.34 76.68 3.27 2.53 
SD 8.48 8.10 8.48 8.10 1.29 1.08 
PPVT I 
M £ 108.41 110.57 
SD 15.03 14.23 


Note. MFF = Matching Familiar Figures Test; WRT = Word-Recognition Test; RCPM = Raven 
Coloured Progressive Matrices; PPVT = Peabody Picture Vocabulary Test. N = 82. 


RESULTS 


Table 2 includes the means and standard 
deviations for each instrument on the test 
and retest. It can be seen from the means 
and standard deviations of IQ scores on the 
PPVT that these subjects were slightly 
above average but well within the normal 
range. 


Reliability 


The values on the diagonal of Table 4 are 
the reliabilities. It can be seen that all of the 
reliabilities are significant at the .05 level 
and all except one (MFF correct score) are 
significant at the .01 level. Interestingly 
enough, two of the three lowest values are 
for the MFF. Table 3 includes the frequency 
distributions and Cramer’s coefficients for 
the four measures. It should be noted that 
while all of the values are significant (mean- 
ing there was stability across administra- 
tions), the distribution of the MFF scores 
more closely resembled that of the RCPM 
than the WRT. 


Convergent Validity 


Table 4 also contains the mean correla- 
tion values acquired from Fisher’s r to z 
transformation, which the authors believe 
is the most appropriate for purposes of the 


present study. The only correlations that 
might have led to different conclusions (i.e., 
were significant on one computation but 
not another) were the correlations between 
MFF correct and RCPM correct, PPVT 
errors and RCPM errors, and MFF errors 
and PPVT errors, which were significant on 
the retest only. Convergent validity was 
determined by the relationship between 
the same measures (error, correct, and la- 
tency) derived from two instruments de- 
signed to measure the same traits (MFF 
and WRT for conceptual tempo; RCPM 
and PPVT for intellectual ability). The rel- 
evant correlations are underlined once In 
Table 4. Although few of the correlations 
are particularly impressive, all three scores 
acquired from the two measures of intel- 
lectual ability are significantly correlated 
while only the latency score is significantly 
correlated for the two measures of con- 
ceptual tempo. 


Discriminant Validity 


The diseriminant validity was ascertained 
by determining the correlations between 
the same scores obtained on the tests of dif- 
ferent traits. The relevant correlations are 
underlined twice in Table 4. In this case the 
three scores on the two measures of con- 


CONCEPTUAL TEMPO 


ceptual tempo were significantly correlated 
with the three scores obtained on the RCPM 
(a different trait). These correlations were 
all higher than the correlations representing 
convergent validity for the conceptual 
tempo trait. In addition, the latency score 
(which Kagan maintains in the most ac- 
curate measure of conceptual tempo) on 
both measures of conceptual tempo are cor- 
related with the scores obtained on the 
PPVT. 


Discussion 


The authors believe that these results 

add important information concerning the 
nature of what has been labeled conceptual 
tempo. In the first place, there does indeed 
seem to be a consistent tendency to display 
slow or fast response times in problem situ- 
ations. The latency reliabilities were all 
quite high, and latencies for all measures 
were significantly correlated with latencies 
for all other measures. It appears that if 
one wants to use response latency as a mea- 
sure of conceptual tempo, then any of these 
instruments would suffice. In effect, there 
was no divergent validity demonstrated for 
the latency measure. 
; The degree to which response uncertainty 
is a necessary condition for this tendency 
to be displayed is somewhat less clear. If 
one maintains that a negative correlation 
between response time and errors is neces- 
sary evidence for response uncertainty be- 
cause quick responders will be inaccurate, 
then the answer is no. On both the WRT 
and PPVT the relationship between errors 
and latency was nonsignificant (yet the 
latency reliability for the WRT was the 
highest of all measures). This would mean 
that the words used on the WRT were not 
enough alike to produce sufficient uncer- 
tainty for these children. 

When we turn to errors or number correct, 
the picture changes. The reliability for 
these measures on the MFF is the lowest 
of the four instruments used. This may be 
due to the fact that it is really a learning 
task (e.g., subjects keep choosing until they 
locate the correct answer). On the other 
hand, the mean improvement on the MFF 
from test to retest was less than one item 
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TABLE 3 
Test-Retest DISTRIBUTIONS AND CRAMER'S 
COEFFICIENTS FOR THE Four INSTRUMENTS 
Usine tHE DounLE Mepian Spur 


Pretest 
Post test 

Slow and| Reflec- |Fast and| Im- 

wrong tive right pulsive 
Matching Familiar Figures Test (p’ = .411) 
Slow and wrong 3 6 1 7 
Reflective 6 14 2 3 
Fast and right 3 1 1 7 
Impulsive 5 3 7 13 

Raven Coloured Progressive Matrices (p' = .437) 
Slow and wrong 9 2 0 4 
Reflective 5 12 7 2 
Fast and right 2 6 2 2 
Impulsive 4 1 3 21 

Word Recognition Test (p! = .492) 

Slow and wrong 18 6 0 5 
Reflective 4 8 3 0 
Fast and right 0 0 16 6 
Impulsive 3 2 5 6 
Peabody Picture Vocabulary Test (p' = .593) 
Slow and wrong 16 2 1 4 
Reflective 6 10 2 0 
Fast and right 0 3 18 2 
Impulsive 2 2 2 12 


(the improvement on both the PPVT and 
RCPM was more than one). In addition, 
the same reliabilities for the WRT (pre- 
sented in the same manner as the MFF) 
were excellent. This low reliability brings 
into question the procedure of using a double 
median split to classify subjects as reflec- 
tives and impulsives (over half of the stu- 
dents changed in classification from pretest 
to posttest). ‘ 
The contention that conceptual tempo is 
independent from verbal intelligence is sup- 
ported by the lack of significant relationship 
between errors and number correct on the 
MFF and the same measures on the PPVT. 
The fact that the latencies were signifi- 
cally correlated with number correct (and 
number wrong) on both the MFF and 
RCPM, and that there were also significant 
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CONCEPTUAL TEMPO 


correlations between errors and number cor- 
rect on the MFF with errors and number 
correct on the RCPM, gives us some indi- 
cation of the kinds of abilities measured by 
the MFF. Here one could contend that the 
MFF is a rather poor (due to the low test- 
retest reliability) measure of the same abil- 
ities as those measured by the RCPM. These 
$ abilities have recently beome the subject of 
considerable interest in developmental and 
educational psychology. 
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EFFECTS OF VICARIOUS CONSEQUENCES AND MODEL AND 5 
EXPERIMENTER SEX ON IMITATIVE BEHAVIOR 


IN FIRST-GRADE CHILDREN 


CANDACE S. GARRETT ax» DONALD J. CUNNINGHAM 


Indiana University 


This study investigated the effects of sex of model and experimenter 
and the consequences (verbal reward, punishment, or ignoring) of the 
model's behavior in a non-sex-typed modeling situation on the imita- 
tive behavior of first-grade children. The results indicated (a) reward 
and ignore conditions were not different but both yielded higher imita- 
tive scores than the punish condition; (b) same-sex models (reward 
condition) yielded higher imitation scores than opposite-sex models, 
(c) highest imitation scores (reward condition) were obtained by chil- 
dren exposed to same-sex models and experimenters, and lowest imita- 
tion scores (punish condition) by children exposed to a male experi- 
menter and a female model. These results are discussed in relation to a 
social learning theory of sex-role development and methodological 


considerations. 


Current popular interests: in maximiz- 
ing children’s potentials and in guarantee- 
ing individuals’ rights have raised concerns 
that some sex-typed behaviors and atti- 
tudes may be potentially limiting to chil- 
dren of both sexes. An understanding of the 
development of these behaviors is an impor- 
tant and necessary first step toward revising, 
if possible, the environmental constraints 
that produce these limitations, A survey of 
the available literature (Garrett, Cunning- 
ham, & Buelow, 1974) unearthed only a 
few experimental studies concerning as- 
pects of sex-role development. Most, of the 
research reviewed in that paper came from 
studies that were not designed to investigate 
the effects of sex in the various experimental 
situations; sex was usually Secondary to the 
purposes of the studies, Moreover, the theo- 
retical issues examined in those papers were 
not usually tied directly to sex-role develop- 
ment. There is, therefore, a need for theo- 
retically based empirical research aimed 
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directly at understanding the course of sex- 
role development. 

Three major sets of theories can be iden- 
tified that purport to describe and explain 
sex-role development: psychoanalytic, cog- 
nitive-developmental, and social learning 
theory (Mischel, 1970). The last theory ap- 
pears to be the most appealing as it is both 
comprehensive and amenable to operational 
formulation. According to social learning 
theory, the development and emergence of 
sex-typed behaviors and attitudes can be 
described by the same learning principles 
used to account for any other aspect of a 
person's social behavior, generally principles 
related to reinforcement and imitation 
(Bandura, 1971). This theory emphasizes 
characteristics of the subject and his or her 
environment that contribute to and affect 
sex-role development. 

The results of past research related to 
Sex-role development correspond reasonably 
well with this theory. Also, socia] learning 
theory has been systematically investigated 
as a basic model for a wide variety of social- 
emotional learning, such as aggression (Ban- 
dura, Ross, & Ross, 19632), physical affec- 
tionate behavior (Fryrear & Thelen, 1969), 
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self-reward criteria (Bandura & Kupers, 
1964), and moral judgments (Bandura & 
McDonald, 1963). If the acquisition of sex- 
role behaviors and attitudes can be analyzed 
in terms of this learning theory and if the 
relevant subject and environmental vari- 
ables can be identified and interrelated, then 
it should be possible to foster less stereo- 
typed, more flexible, sex-role adaptions 
through environmental manipulations. 

Of interest in this study were the following 
variables: vicarious reward or punishment, 
the sex of the subject, the current sex-role 
preference of the subject, the sex of the 
model, and the sex of the experimenter who 
managed the imitation situation and dis- 
pensed verbal rewards or punishments to the 
model as a consequence of the model's be- 
havior. With respect to vicarious reward or 
punishment, it was expected that children— 
when placed in an imitation situation where 
they observe a model’s responses either re- 
warded, punished, or ignored—will imitate 
models who are verbally rewarded, counter- 
imitate (i.e., choose the opposite responses 
from) models who are verbally punished, 
and neither imitate nor counterimitate mod- 
els who are verbally ignored. Support for 
this hypothesis can be found in Bandura, 
Ross, and Ross (1963b) and Liebert, Sobol, 
and Copemann (1972), among others (see 
Garrett, Cunningham, & Buelow, 1974, for 
further discussion of this and the other hy- 
potheses). The relationship between conse- 
quences of models’ behaviors and subjects’ 
imitation responses has not always been 
symmetrical, however. In the study reported 
by Bandura et al. (1963b), children dis- 
played more imitative aggression after they 
had seen an aggressive model rewarded for 
aggression, but children who viewed an ag- 
gressive model punished for aggression did 
not reduce their level of imitative aggression, 
below that of the control group who viewed 
à nonaggressive model or the control group 
that did not view a model. 

The remainder of our hypotheses con- 
cerned the effects of sex of model and sex of 
experimenter on the imitative responses of 
first-grade children. Experimenters and 
models were expected to produce more imita- 
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tive responses in reward conditions and 
counterimitative responses in punishment 
conditions when they were the same sex as 
the subject. Furthermore, it was predicted 
that male models and experimenters would 
produce more imitation or counterimitation 
than female models and experimenters. 

From a theoretical perspective, it is pos- 
sible to predict that either males or females 
would be the more salient models and experi- 
menters for young children. From a rein- 
forcement viewpoint, some researchers and 
theorists argue that the female may be the 
more salient since children are exposed to 
females more than to males (Maccoby, 
1959). In addition, mothers and teachers, 
usually female, reinforce children for imi- 
tating more than fathers and other males do 
(Rosenblith, 1961). However, the father, 
as a reality enforcer, may be a more salient 
reinforeer and model (Epstein & Liverant, 
1963). He is given more prestige in this cul- 
ture (Cook & Smothergill, 1973), and males 
are unusual in a school setting (Stevenson, 
Keen, & Knights, 1963). From a sex-role 
development perspective, by the time a 
child enters school, people of both sexes are 
positively reinforcing the child for appropri- 
ate sex-typed behaviors, perhaps learned by 
imitating a like-sex model. 

Children undoubtedly learn at least some 
of their appropriate sex-typed behaviors 
through observational learning from like-sex 
models (Mischel, 1966) as well as from di- 
rect tutorage and reinforcement (Bandura, 
1969). This may result in a tendency for 
children to imitate models of the same sex. 
For instance, Garrett (1971) found that 
pairs of opposite-sex, first-grade children 
who watched a videotape of a pair of oppo- 
site-sex, junior-high-school-age models en- 
gage in either appropriate or inappropriate 
sex-typed behaviors tended to imitate the be- 
haviors of their like-sex models'in a similar 
situation. Not all studies have found such 
effects, however (see Garrett et al., 1974). 

The present study, then, was designed 
specifically to investigate the effects of sex 
of model and experimenter on children in 
a non-sex typed imitation situation where 
the subject observes the experimenter either 
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reward, punish, or ignore the model's re- 
sponses. In this way, we wanted to explore 
the power of social learning theory to pre- 
dict children's modeling behavior in circum- 
stances involving variables that are least 
potentially relevant to sex-role development. 


HYPOTHESES 


1. Children will imitate models who are 
verbally rewarded by the experimenters, 
counterimitate models who are verbally 
punished, and neither imitate nor counter- 
imitate models who are verbally ignored. 

2. Males will be more salient? models and 
experimenters and will make the most sali- 
ent combination of model and experimenter 
when vicarious reinforcement is involved. 

3. The same-sex model, experimenter, and 
model-experimenter combinations as the 
subject will be the most salient when vicar- 
ious reinforcement is involved; that is, 
even with overall male saliency, there will 
be like-sex modeling within each subject’s 
sex. 


MxrHoD 


The methodology in this study is modeled after 
that developed by Liebert et al. (1972) in a study 
examining the effects of vicarious consequences and 


race of model on imitative performance by black 
children, 


Subjects 


Subjects were drawn from first-grade classrooms 
made available by the Monroe County Community 
School Corporation in Bloomington, Indiana. One 
classroom of 16 boys and 12 girls was located in a 
predominantly rural portion of Monroe County ; 
this classroom was used to pretest pairs of stimulus 
pictures for use in the actual experiment. The sub- 
ject pool from which subjects were selected for the 
main experiment came from two other schools: four 
classes (72 children) from a school serving mainly 
an upper-middle-class urban population and two 
classrooms (52 children) from a school serving a 
blue-collar urban population. Although the hetero- 
geneous nature of the populations tested introduced 
extraneous variance into the design, the authors 
were forced to work with available populations. Ab- 
sences reduced the final pool of subjects to 115 
children—62 males and 53 females, From this pool 
48 children of each sex were drawn to complete the 

design in a manner described below. Approximately 


*Saliency is defined as the degree to which the 
condition produces imitative responses when the 
model is verbally rewarded and counterimitative 
responses when the model is verbally punished. 
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5895 of the children who actually served as experi- 
mental subjects came from the middle-class school, - 
42% from the other school. 


Experimenters and Models 


The adult experimenters and models used in the 
study were graduate and undergraduate students 
from the Indiana University School of Education, 
In order to decrease the possibility that nonsex 
differences in appearance or behavior among the 
models and experimenters would produce unreliable — 
effects, two male and two female models and two | 
male and two female experimenters were used. The 
experimenters and models were balanced across the 
design. Several training sessions were provided prior 
to the experiment; in these the purpose of the ex- 
periment was explained to all experimenters and 
models, and they practiced their roles. 


Materials 


Stimulus pictures. The stimulus pictures used in 
the experimental phase of the study were taken 
from the Peabody Language Development Kit, 
Level 1 (1965). Fifty pairs of pictures of a variety 
of objects (e.g., food, vehicles, animals, ete.) were 
selected as having no obvious differential appeal by. 
sex. These picture pairs were pretested by a male 
and a female experimenter in a group, dichotomous- 
choice format in the first-grade classroom described 
above (Garrett et al., 1974). 

Ten sets of picture pairs that showed equal or 
nearly equal preference among both boys and girls 
were selected for use in the main experiment. More 
heterogeneity in preference was found than had - 
been expected, but the balance of preferences be- 
tween pictures by boys or girls never exceeded a 
4:7 ratio. 

Group Toy Preference Test. The group Toy 
Preference Test, developed by Anastasiow (1963), 
was administered to all available subjects in the 
two schools described above. This test uses pictures 
of five “masculine” and five “feminine” objects and 
provides a measure of current sex-role preference, 
which was used as a blocking variable in this de- 
sign: The objects had differentiated significantly 
between girls and boys (p < .001) in research per- 
formed earlier using actual toys. Details concerning 
the administration and scoring of this test may be 
found in Garrett et al. (1974). 

In evaluating the test, Anastasiow's test-retest 
comparisons yielded a rank-order correlation of 
-96 and a Pearson product-moment correlation o; 
81 on individual scores. Comparisons with th 
Wright Picture Preference Test and the Sears= 
Wright Toy Preference Test yielded significant 
correlations in test results, convincing AnastasioW 
that the three tests were measuring the same con- 
struct. 


Procedures 


The Group Toy Preference Test was adminis- 
tered by a male and a female experimenter to intact. 


classrooms approximately two weeks before the 
main experiment began. One experimenter read the 
directions and paced the subjects, while the other 
supervised the subjects and attempted to resolve 
any problems. The experimenters reversed their 
roles in each successive classroom so that half of 
the time each sex served in each role. Procedures 
for administration were adapted from those used 
by Anastasiow (1963). 

Scores for each subject were computed as speci- 
fied above and quartiles determined for purposes of 
assigning subjects to the design. 

The experimental phase of the study was imple- 
mented over a period of one week during the spring 
of 1973. The experimental sessions were held in 
school hallways or unused rooms. Despite less than 
ideal conditions, there were few external disturb- 
ances. The experimenter went to the subject’s class- 
room, introduced himself (or herself), and brought 
the child back to the experimental location. On the 
way, the experimenter chatted with the subject, 
telling the subject that he was going to “help us 
pick some pictures.” At the experimental location, 
the experimenter introduced the subject to the 
model by name and asked the subject to sit down. 
The model always sat directly across a table from 
the experimenter, with the subject sitting by the 
model's side. 

The experimenter then said, “I am going to show 
you both some pictures of objects. First I am going 
to show ——[the model] the pictures, and he/she 
will tell me which ones he/she likes. Then after—— 
is done, it will be —’s [the subject’s] turn to te 
me which picture he/she likes best. All right, —, 
which of these two pictures do you like best: the 
hot dog or the hamburger?" 

The 10 pairs of pictures were then presented to 
the model, one pair at a time with the pictures held 
side by side. The same sequence for presenting the 
pictures was used throughout the experiment. The 
so that each picture was 
always presente le of the pair. The 
experimenter had a record sheet for each subject, 
indicating the subject’s assigned experimental treat- 
ment (reward, puni ^ 
choices) and the rando 
the model was to make wi 

After the experimenter 
preference within each picture pair, 
paused about two seconds and then responded by 
pointing to the appropriate picture and saying 
“This one.” No other verbalizations or actions were 
made by the model. Depending on the subject’s 
treatment condition, the experimenter made one of 
he following responses to the model's choice: 
Reward the model: "That's a good one.” 

“That’s the better one.” 4 
Punish the model: “That’s not 2 good one." or 

“That's not the best one.” 
Ignore the model: no respo 

model's choice. 


er also marked the model's choices 


or 


nse was made to the 


he experiment 
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on the record sheet to maintain the illusion for the 
subject that the experimenter was keeping a record 
of the model’s choices. 

After the model had viewed all 10 picture pairs, 
the experimenter thanked him; the model left the 
experimental location. The subject was then told, 
“Now, it’s your turn. Tell me which of these two 


pictures you like the best: the hot dog or the ham- 
then proceeded as with 


burger.” The experimenter 

the model, presenting the pictures and asking for 
and recording the subject's preferences. All approv- 
ing or disapproving comments were avoided. When 
the subject finished, he or she was praised for doing 
a good job and helping the experimenter and then 
returned to the classroom. Total time for adminis- 
tering the procedures was approximately five min- 
utes. Only minor deviations from these procedures 


occurred. 


Design 

The design of this study was a 3 X 2 X 2 X 2 
(Vicarious Consequences X Sex of Experimenter X 
Sex of Model X Sex of Subject) full factorial design. 
with sex-role preference quartile as à blocking vari- 
able. This design resulted in four subjects in the 
highest-order interaction cell. 


Analysis 

The data were analyzed using analysis of vari- 
ance techniques with planned orthogonal compari- 
sons. Hypothesis 1, the main effect of vicarious 
consequences, Was tested with Scheffé multiple 
comparisons. Hypothesis 2, the interactions of 
vicarious consequences with sex of the model (two- 


way), with sex of the experimenter (two-way), and 
f the experimenter 


(three-way), was tested with planned orthogonal 
comparisons. 3, the interactions of vi- 
carious consequences and sex of the subject with 
sex of the model (three-way); with sex of the exper- 
imenter (three-way), and with sex of the model and 
sex of the experimenter (four-way), Was tested with 
planned orthogonal comparisons. 


RESULTS 


Table 1 presents the three general hy- 
potheses, the subhypotheses derived from 
these for testing the effects of sex of the 


model, experimenter, and subject within 


the vicarious reward and punish conditions, 
were statis- 


and whether the subhypotheses 
tically verified. As this table indicates, Hy- 
pothesis 1 was partly verified. The maim ef- 
fect of vicarious reinforcement condition 
was significant (F = 15.32, df= 2/12,» € 
001). Scheffé multiple comparisons revealed 
that the reward (X = 5.10 out of a maximum 
of 10) and ignore (X = 4.47) conditions 
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were not different, but the punishment (X 
= 1.84) and ignore conditions were (F = 
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TABLE 1 
STATISTICAL VERIFICATION OF HYPOTHESES 
Statistical 
Hypothesis* veriti. 
cation’ 


1. Vicarious reinforcement condition 


(a) Reward (n = 32) > ignore (n = 32) No 
(b) Ignore > punish (n = 32) Yes 
2. Male Ms, Es, and E-M combinations} 
most salient 
(a) In reward condition: 
(i) Male M (n = 16) > female M| No 
(n = 16) 
(ii) Male E (n = 16) > female E| No 
(n = 16) 


(iii) Male E-male M (n = 8) > all! No 
other E-M combinations (n 
= 24) 
(b) In punish condition: 
(i) Male M (n = 16) < female M| 
(n = 16) 
(ii) Male E (n = 16) < female E 
(n = 16) 
(iii) Male E-male M (n = 8) < 
all other E-M combinations 
n = 24) 
3, Same-sex M, E, and E-M combinations 
as S most salient 
(a) In reward condition: 
(i) Male M-male S and female 
M-female S (n = 16) > all 
ather M-S combinations (n 
- 16) 
(ii) Male E-male S and female 
E-female S (n = 16) > all 
hes E-S combinations (n 
(iii) Male E-male M-male S and 
female E-female M-female 
S (n = 8) > all other E-M-S 
combinations (n — 24) 
(b) In punish condition: 
(i) Male M-male S and female | No 
M-female S (n — 16) « all 
Am M-S combinations (n 
(ii) Male E-male S and female 
E-female S (n — 16) « all 
Bae E-S combinations (n 
(iii) Male E-male M-male S and | No 
female E-female M-female S 
(n = 8) < all other E-M-S 
combinations (n — 24) 


Yes 


No* 


8.84, df — 1/72, p « .01). Significantly more 
imitation occurred in the vicarious ignore - 
and reward conditions than in the punish 
condition. 

A secondary analysis using ¢ tests com- 
pared the sample mean of the imitation 
scores for male and female children in the 
vicarious reward and punish conditions 
against the population mean of 5.00, the 
mean value of the imitation scores that 
would have occurred if the treatment had 
had no effect on the children’s choices. Only 
the punish condition yielded means that 
statistically differed from the population 
value (male children: X = 1.69, t = —6.44, 
p < 001; female children: X = 2.00 t = 
—4.60, p « .001). 

All the subhypotheses associated with Hy- 
pothesis 2 were not supported. In the reward 
and punish conditions, male experimenters 
and models were not more salient than fe- 
male experimenters and models. 

Hypothesis 3 received partial support in 
the reward condition, no support in the pun- 
ish condition. The planned comparisons indi- 
eate that in the reward condition same-sex 
models as the subject were more salient for 
young children than models of the opposite 
sex (F = 423, df = 1/72, p < .05); there 
was no difference between same- and oppo- 
site-sex models in the punish condition. 
There was no difference between same- and 
opposite-sex experimenters as the subject in 
either the reward or punish condition. 

Because of the small cell size (n = 4) in 
the interaction of vicarious reinforcement 
condition, sex of the experimenter, sex of the 
model, and sex of the subject, multiple-com- 
parison tests among cell means would lack 
power, However, an examination of the 
Means associated with this interaction may 
disclose some interesting trends. Table 2 


; Note. M = model, E = experimenter, S — sub- 
ject. 

* The number in parentheses indicates the num- 
ber of subjects involved in that portion of the 
comparison, 

^ Hypotheses 1a and 1b were tested by means 
of Scheffé multiple comparisons; all other hy- 
potheses were tested by means of planned or- 


thogonal comparisons, using the mean square | 
error term from the full factorial analysis of vari- 
ance as the best estimate of the pooled error term 
for the comparisons (Winer, 1971). 

„° Although not statistically significant, the 
direction of the means is in accord with this sub- 
hypothesis. = 
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i : TABLE 2 
BANS AND STANDARD DEVIATIONS FOR THE INTERACTION 
CONDITION AND SEX OF EXPERIMENTER (E), Mops: dr) an laa y ii 
(v= 4) : à 
Condition 
s nx 
| ex Vicarious reward Tgnore Vicarious punish 
Male S. Female S. Male S Female S Male S Female 5. 
Male M-Male E 
x 6.50 4.25 4.00 
E : ` 3.50 2.00 3.50 
SD 2.38 3 
fee T 2.63 2.45 3.00 1.41 4.73 
x 6.25 4.50 5.00 
$ : 3.50 2.15 1,75 
SD 4.35 4 
Pee o 1.29 2.45 3.11 2,75 1.71 
X 4.25 5.00 3.50 5.00 
SD 3.40 0. : : d 96 
Female M-Female E S um ji E 
X 3.75 6.25 5.50 5.75 
: 3 h i 1.26 2.00 
SD 1.50 3.30 2.08 1.50 2.50 1.03 


presents these means and standard devia- 
tions. As the table indicates, the highest imi- 
tation scores for the reward condition were 
obtained by female children exposed to fe- 
male models and experimenters and male 
children exposed to male models and experi- 
menters. So the direction of the data is in 
accord with the corresponding subhypothe- 
sis. For male and female children in the 
punish condition, the greatest counterimita- 
tion occurred with a male experimenter and 
female model. This trend is not in the direc- 
tion specified in the corresponding subhy- 
pothesis. 

The effect of the different models and ex- 
perimenters was explored by examining the 
means and standard error of the means asso- 
ciated with each participant in each rein- 
forcement condition. The mean scores of 
each participant generally were within +1 
standard error of the other participants. The 
specific individuals involved in the study as 
experimenters and models did not seem to 
make much difference in the children’s be- 
pievor However, one female was particu- 
larly ineffective; her imitation scores in all 
three conditions were approximately the 
same. 


Discussion 


The effects of vicarious reward and pun- 
jishment were asymmetrical: that is, only 


punishment produced a significantly differ- 
ent number of imitation responses from the 
ignore condition, It is possible that the sub- 
jects in the various treatment groups may 
have interpreted the experimenter's silemee 
with respect to their choices differently, A 
subject who had observed the experimenter 
lavishly reward the model may have inter- 
preted the experimenter's silence as direct. 
punishment, which could have inhibited the 
subject's imitation responses. A subject in 
the punishment condition may have inter- 
preted the experimenter's silence as an indi- 
cation that his or her responses were correct, 
and hence, the subject’s counterimitative re- 
sponses would have persisted. This inter- 
pretation is consistent with the literature on 
the direct consequences of telling subjects 
that their responses are right, wrong, or not 
commenting (e.g. Buss, Braden, Orgel, & 
Buss, 1956). The general finding in these 
experiments was that either the combination 
of “right” and “wrong” or no comment and 
"wrong" produced more rapid learning than 
when the subject was told “right” when he or 
she was correct and no comment when he 
or she was wrong. One interpretation of these 
results is that no comment acquires positive 
reinforcing properties in the combination of 
no comment and “wrong”; and negative re- 
inforcing properties in the combination of 
“right” and no comment ( Buchwald, 1959). 
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It is possible that similar effects were opera- 
ting in the present experiment where the re- 
inforcement was vicarious rather than direct. 
Travers, Van Wagenen, Haygood & McCor- 
mick (1964) have documented such effects 
for vicarious learning situations. 

This explanation seems plausible, but it 
is not consistent with the fact that research 
using a very similar methodology to the one 
employed here produced more symmetrical 
effects for vicarious reward and punishment 
(e.g., Liebert et al, 1972). Explication of 
these inconsistencies requires further theo- 
retical analysis and experimentation. Future 
experimentation should attempt to control 
these possibilities through careful designs. 

The remainder of our hypotheses received 
only partial support. There was no tendency 
for males or females to serve as more salient 
models and experimenters when vicarious 
reinforcement was involved. Sex of the 
model and/or the experimenter alone does 
not seem to be a relevant variable in this 
situation. However, the match between sex 
of the subject and sex of the experimenter 
&nd model does seem to be relevant. The 
reduction in overall variance caused by the 
failure of the children in the reward condi- 
tion to engage in much imitation is, un- 
doubtedly, one reason why the effects were 
not as strong as expected. Nevertheless, cer- 
tain trends in the data do reveal interesting 
effects, In the reward condition, like-sex 
models and experimenter-model combina- 
tions did tend to produce more imitation, as 
predicted. In the punish condition, however, 
the combination of male experimenter-fe- 
male model produced the most counterimita- 
tion in both sexes, Like-sex imitation, there- 
fore, appears most likely in positive, 
nonthreatening situations. However, in 
somewhat threatening, negative conditions, 
the child reacts most strongly to the experi- 
menter-model combination that resembles 
a traditional stereotype of home life: an in- 
strumental and authoritarian father defining 
as Incorrect certain behaviors of a somewhat 
passive mother. To what extent this stereo- 
type typifies the actual family relationships 
of the children involved is unknown. One 
might expect that such a relationship would 
be more prevalent among lower-class rather 
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than middle- or upper-class children, but our 
sample size was not sufficient to test this” 
assertion. Future research of this type should 
attempt to ascertain the relationship be 
tween the subject's performance on modeling: 
tasks and the type of family relationships) 
which the child or others perceive as existi) 
in the home. : 

"Thus, this experiment, while it contains 
certain departures from our expectations, 
does demonstrate that social learning thes 
ory and experimental methodologies derived: 
from it ean provide useful information con= 
cerning the development of sex-role behay= 
iors. Refinements of the present experiment 
and inclusion of other variables such ag) 
children’s perception of parental dominance, 
views of traditional sex-role stereotypes, 
birth order, locus of control, etc., promise to 
yield meaningful information concerning. 
sex-role development. This information, 
combined with other recent research devel 
oping a similar theoretical viewpoint (egy 
Cook & Smothergill, 1973), should lead to 
the construction of a coherent and useful 
model of sex-role development and to the 
discovery of means to alter environments; 
educational and otherwise, that limit ina 
dividual opportunity. j 
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The study sought to examine the relationship between achievement 


and a number of variables shown to 
including IQ, to determine their 
first-grade black children were ini 


be associated with learning ability, 
predictive effectiveness, Sixty-two 
dividually tested on the Wechsler 


Intelligence Scale for Children and tasks from three learning assess- 
ment strategies: (a) learning potential Strategy using the Raven's 
Coloured Progressive Matrices in a pretest-coaching-posttest format, 
(b) diagnostic teaching, and (c) paired-associate learning under three 
conditions designed to facilitate learning. IQ correlated moderately 
with achievement, In general, diagnostie teaching exceeded IQ in pre- 
dictive effectiveness, Prospects for a more precise determination of the 
learning potential in young children were discussed. 


The limitations of the intelligence test lie 
not only in its inability to accurately differ- 
entiate the learning potential of children, but 
in the value of the information for pre- 
scribing educational change. This increasing 
concern about the predictive and diagnostic 
effectiveness of the IQ test has been focused 
primarily on the performance of subcultural 
populations undoubtedly because of the con- 
vietion that its use often leads to highly 
questionable social and educational prac- 
tices, 

Several approaches to the assessment and 
prediction of learning ability have been 
advocated to circumvent the limitations of 
the intelligence test, Among these are the 
development of systematic teacher observa- 
tional techniques and the use of achievement 
tests (Loretan, 1965; Yourman, 1964). Other 
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areas of investigation that seem to hold 
promise for advancing beyond the limita- 
tions of IQ are (a) learning potential assess- 
ment strategy, (b) diagnostic teaching, and 
(c) paired-associate learning. 

Focusing almost exclusively on the edu- 
cable mentally retarded, Budoff and his as- 
sociates (Budoff & Friedman, 1964; Budoff, 
Miskin, & Harrison, 1971) developed a 
Strategy geared to assess the learning poten- 
tial of individuals from non-middle-class 
backgrounds by exposing the individuals to 
actual learning experiences. The strategy 
was based on the hypothesis that by offering 
the relevant learning opportunity through 
Systematic exposure to nonverbal reasoning 
tasks, educable mentally retarded children 
could be differentiated more precisely than 
by the verbally biased IQ tests. On the Kohs 
Block Design Test he identified children as 
“high scorer,” “gainers,” and “non-gainers” 
on the basis of their initial high scores and 
ability to profit from a coaching experience. 
In contrast to the non-gainers with equiva- 
lent IQ, the high scorers and gainers, who 
were shown to be associated with disad- 
vantaged background, were found to be 
educationally rather than mentally retarded. 
In essence, learning potential status was 
more predictive of learning ability with 
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— jower-socioeconomie children than was the 
traditional IQ test. 

The pressing need to recognize individual 
differences in learning ability not tapped by 
jntelligence has stimulated some considera- 
tion for a diagnostic teaching approach to 
assessment (Meyer & Hammill, 1969; Ro- 
senberg, 1968; Severson 1971). The advo- 
cates of diagnostic teaching are emphatic in 
their objective of shifting from an examina- 
tion of static cognitive abilities toward the 
assessment of the process learning behavior 
of the child. More specifically, this involves 
an assessment of the individual differences 
that significantly influences the child’s 
learning skills and his learning style. Empir- 
ical studies to determine the predictive and 
diagnostic value of diagnostic teaching are 
still in an exploratory stage. One such study 
is that of Jones (1970) who investigated the 
relationship between learning ability and 
achievement. She reported a correlation of 
73 which underscores the need for further 
examination of diagnostic teaching in com- 
parison with other variables shown to be 
associated with learning ability. 

x Jensen (1961) initiated awareness of the 
importance of looking at actual learning 
abilities of subcultural populations primarily 
because of the influence of past learning on 
IQ performance and the pronounced differ- 
ences in environmental conditions for school- 
related learning between different socio- 
economic status (SES) groups. In the efforts 
to separate the individual's past learning and 
current, learning abilities as reflected in the 
IQ score, the method of assessing the learn- 
ing ability using paired-associate tasks 
which provide direct measures of learning 
abilities was conceived. The wide disparity 
between blacks and whites, lower- and up- 
per-socioeconomic classes in measu in- 
telligence is not reflected in an equally dis- 
crepant learning ability when learning is 
measured directly by performance on paired- 
associate tasks (Green & Rohwer, 1971; 
Rohwer, Ammon, Suzuki, & Levin 1971; 
Rohwer, Lynch, Levin, & Suzuki, 1968; 
Semler & Iscoe, 1963). Intrigued by these 
findings, psychologists and educators began 
to question the relationship between labora- 
tory-based, paired-associate learning tasks 


949 


and classroom learning. Resulting research 
have shown a moderate relationship between 
paired-associate learning and school achieve- 
ment (Giebink & Goodsell, 1968; Lambert, 
1970; MeCullers, 1965; Rohwer & Levin, 
1971; Stevenson, Hale, Klein, & Miller, 
1968). 

In addition to having the child engaged 
directly in learning in order to assess his 
learning ability, others have asked whether 
the child could improve his learning ability 
as a function of training (Davidson, 1964; 
Rohwer, 1971). The appropriateness of 
training activities to enhance learning can 
be seen from the observation that the cul- 
turally disadvantaged child on nonverbal 
tests reveals an absence of the mediational 
tendency found among the middle- and 
upper-class children. Jensen (1966) noted 
that the tendency to verbally mediate is one 
aspect of learning ability that is susceptible 
to environmental influence. Essentially the 
same observation was made by Rohwer 
(1971) who pointed out that differences in 
paired-associate performance by different 
SES groups can be attributed to differences 
in spontaneous elaboration by children of 
different populations. 

The effectiveness of mediators in facili- 
tating paired-associate Jearning has been 
adequately demonstrated (Davidson, 1964; 
Jensen & Rohwer, 1963; Paivio & Yuille, 
1967; Rohwer & Ammon, 1971). Conse- 
quently, if one asumes that the culturally 
disadvantaged child has had less than an 
optimal opportunity for the development of 
his verbal facilities, then conceivably, per- 


process would not only be important to 
school achievement as Jensen (1966) sug- 
gested but could provide invaluable educa- 
tional diagnostic information. 

The present study is an attempt to exam- 
ine the relationship between achievement 
and number of variables shown to be associ- 
ated with learning ability, including the in- 
dividually administered intelligence test to 
determine their predictive effectiveness. 
secondary objective is to investigate the 
interrelationships between the three methods 
of assessing learning ability (learning po- 
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tential assessment, paired-associate and di- 
agnostic teaching) selected to focus on the 
shift from the assessment of cognitive abili- 
ties by putatively culturally biased instru- 
ments to the emphasis of learning ability as- 
sessed by direct engagement in learning. 


METHOD 


Subjects 


Sixty-two black children (31 boys and 31 girls) 
were randomly selected from five first-grade class- 
rooms of a public elementary school in Milwaukee, 
Wisconsin. Subjects were from lower-socioeco- 
nomic backgrounds determined by criteria of 
income, residential area, and housing type. From 
the original sample, two subjects were excluded be- 
cause of apparent sensory handicaps and three 
transferred to different schools before all tasks 
were administered, The age range was 5 years 10 
months to 7 years 5 months with a mean of 6 years 
4 months. 


Materials and Procedure 


Learning potential assessment strategy. This 
approach, (Budoff & Friedman 1964) involves an 
initial administration (pretest) of the Raven’s 
Coloured Progressive Matrices (CPM), Sets A, 
AB, and B, followed by training to achieve the fol- 
lowing objectives: (a) help the child to attend to 
all choices presented in each problem; (b) en- 
courage reflective thinking rather than impulsive 
choosing of an answer; and (c) to aid in develop- 
ing an accurate analogy in the 2 x 2 format em- 
ployed. The training format consisted of a number 
of important components and strategies. No actual 
j 3 training, and examples 
consisted of pictures of real objects (tree, car, flag, 
ete.). Great, effort was made to explain the prob- 
lem-solving Strategies, and motor involvement was 
insured by having the child draw the item that 
completes the pattern before choosing the various 


resented. Following trainin, 
4 posttest was administered, s si 


workbooks for first-grade children. Subjects were 


response. The rate of presentation on both st; 
r ud; 
and test trials was five Seconds, with a two-minuts 
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interval between trials. The number of correct, re- 
sponses on the test trial was used as the dependent 
variable. 

Condition 1 involved presentation of side-by- 
side picture pairs on an 8% x 11 page. Children 
were instructed to associate each pair so they could 
remember the second one when the first was 
presented, and each picture pair was shown once, 

Condition 2 was identical to Condition 1 except 
that subjects were instructed to generate an elabo- 
ration strategy to enable them to remember the 
pair (e.g., “Make a sentence out of the paired ob- 
jects”). 

Condition 3 was an imposed elaboration condi- 
tion wherein a picture was shown depicting an 
interaction between both objects in addition to the 
side-by-side picture. The investigator then read a 
sentence which linked the picture pair in the inter- 
action (eg., “The present covers the leaf”). Both 
pictures were presented on the same page and 
instructions were provided on the first trial only, 

Diagnostic teaching. The material for the exam- 
ination of the changes in behavior of the child 
under different conditions of reinforcement was 
15 words unknown to the subjects. They were 
presented a list of 25 words of approximately equal 
learning difficulty in order to select 15 which the 
child could not decode. The words were slightly 
idiomatic in order to reduce probability of famil- 
iarity, but chosen so they would be in the speaking 
vocabularies of most first-grade children (Sever- 
son, 1971). 

Five words were selected for each of the three 
different teaching conditions which proceeded from 
Jeedback to social praise to tangible reinforcement. 
All words were printed in capital letters on 3 X 5 
inch index cards. Each child was exposed to the 
words by the following study-test method : 


I am going to show you these five words 

and tell you what they are. Let's see how 
many you can learn when I show them to you. 
Okay? 
* This word is __ (say slowly). This word 
1s —_, this is ___, this is ___, and this is 
— (now shuffle them). Let's see how many 
you remember. 


Under the feedback condition, the investigator's 
reaction was limited to telling the child if the re- 
Sponse was correct or incorrect. A correct response 
under the social praise condition brought an 
enthusiastic, positive reaction from the investi- 
Bator. During the Condition 3, the child was given 
a piece of candy for every two correct responses 
he made. The words under all three conditions 
Were administered for a total of three complete 
trials, or two trials with all words completely 

own. 

Seven subtests of the Wechsler Intelligence Scale 
for Children (WISC) were administered accord- 
QE o standard procedure. The WISC and all 
three measures of learning ability were individ- 
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ually administered to all subjects by a black 
examiner. The first session was devoted to the 
WISC, the second to the diagnostic teaching, and 
to the pretest and coaching on the Raven's CPM. 
The third session was devoted to the posttest on 
the Raven’s CPM and the paired-associate tasks. 
The third session was, in all cases, on the day fol- 
lowing the second session, Testing began the last 
week of September and continued through the first 
week of January. Achievement criterion was the 
Word Reading and ‘Arithmetic subtests of the 
Stanford Achievement Test administered the third 
week in April. 


RESULTS 


All achievement, WISC and learning abil- 
ity variables were intercorrelated by prod- 
uct-moment correlations. The results are 
summarized in Table 1. 

From Table 1 it can be seen that intel- 
ligence is related to all three conditions of 
diagnostic teaching, paired-associate learn- 
ing, and to both pre- and posttests of the 
Raven’s CPM. Interestingly, the imposed 
elaboration condition of the paired-associate 
testing and the posttest following training 
on the Raven’s CPM produced substantial 
shifts in the relationships to intelligence and 
diagnostic teaching. The observed correla- 
tional differences between the Raven's CPM 
pre- and posttests and the other ability vari- 
ables are in line with expectation and sup- 
port the diagnostic significance of this ap- 
proach. The three conditions of diagnostic 
teaching are moderately related but still 
show considerable independence considering 
the degree of similarity in each task. 

Although the Verbal and Performance 
1Qs are not as highly related to achievement 
as Full Scale IQ, there is clear evidence to 
suggest that intelligence relates to achieve- 
ment at a highly significant level (p < 001) 
for black children. Of particular importance 
in this study is the relationship between the 
three subtests of diagnostic teaching and 
achievement. It is apparent that five of the 
six reported correlations between diagnostic 
teaching and achievement, were higher than 
the relationship found between intelligence 
and achievement. Although the correlations 
were generally higher, only one relationship 
was significantly higher, that of social praise 
to arithmetic compared with intelligence to 
arithmetic (t = 2.10, p < 05). The pretest 
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and posttests differences in the correlation 
of the Raven’s CPM to achievement is one 
of the most intriguing results of the study. 
The strength of the relationship between 
posttest and achievement (r = .47 and .54, 
p < .001) is derived primarily from the gain 
scores (post- minus pretest). The pretest 
scores on the Raven's CPM show a weak 
» ger to achievement (r — 22 and 
335). 

The paired-associate tasks, either ini- 
tially, following instruction to elaborate, or 
with imposed elaboration, show very weak 
relationships with achievement. 


Factor Analysis 


All ability and achievement variables 
were subjected to à principal component 
analysis with varimax rotation and compo- 
nents with eigenvalues of one or greater 
were retained. The six factors that emerged 
are summarized in Table 2. 

Factor 1 is characterized by high loadings 
on 1Q, diagnostic teaching, and achievement. 
Factor 2 loads highest on the Raven's CPM 
gain scores, with achievement and verbal ` 
intelligence showing significant loadings. 
This suggests that the gain score is an im- 
portant component of process learning and 
something different from the skills in diag- 
nostic teaching that are being reflected. Fac- 
tor 4 shows & high loading on all conditions 
of elaboration on the paired-associate tasks, 
but no loading of any other variable, indi- 
cating the three tasks are unique measures 
virtually unrelated to intelligence or learn- 
ing in a school-related sense. Factor 5 is & 
nonverbal IQ factor showing loadings both 
on Performance IQ and the initial Raven's 
CPM scores; it is age and sex related; an 
it also shows & loading with word reading. 
Factor 6 seems to be à developmental fac- 
tor, most highly related to age. 


DISCUSSION 


The primary objective of the present 
study was to determine the predictive valid- 
ity of the IQ and à number of measures de- 
rived from three strategies of assessment 
based on direct engagement in learning. I 
IQ is the best index of Jearning ability for 
black children, then that ability should be 
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TABLE 2 


Facron LOADINGS FOR ALL VARIABLES (PRINCIPAL CoMPONENTS WITH 
VARIMAX ROTATION) 


Factors 
Variable 


Sex —.60 
Age —.44 72 
Verbal IQ 40 —.49 
Performance IQ 

Feedback 

Social praise 

Tangible reinforcement 
PA—No elaboration 
PA-—Instruction to elaborate 
PA—Imposed elaboration 87 
Pretest Raven’s CPM .30 .45 .68 
Posttest Raven's CPM 81 .30 
Learning potential gain score 91 

Reading .08 .34 .36 
Arithmetic .73 43 
Percentage of total variance 27.9 12.7 1.7 10.9 10.3 1.9 


Seeee 


Bes 


Note. + .30 and above factor coefficient used as criterion of acceptance. PA = paired-associate; 
Raven’s CPM = Raven’s Coloured Progressive Matrices. 


reflected in a high correlation with achieve- lower-class and middle-class children of 
ment. On the other hand, if the premise that equivalent 1Q. The available evidence indi- 
learning ability is best assessed by having cates (Jensen, 1968) that children with low 
the child engage directly in learning tasks, IQs from middle-class backgrounds are in- 
then a substantial correlation with achieve- variable slow learners in contrast to low- 
ment should be expected from these strate- SES children of equivalent IQs who demon- 
gies. strate a wide range of learning ability. 
The results are clearly consistent with re- Although the predictive effectiveness of the 
ported predictive effectiveness of the IQ low mean IQ (87) is evident, the indications 
(Jensen, 1969). However, if the predietion that it is not an accurate index of learning 
of achievement is the primary purpose for ability for blacks (Semler & Iscoe, 1963) 
testing, diagnostic teaching with its crite- and the indications that “the labeling. of à 
tion-referenced quality seems decidedly child with a “permanent! stratification index 
preferable to the IQ in the light of these re- (1Q) is likely to affect his self-concept, his 
sults. Diagnostic teaching which emphasizes goals, his motivations and his achievement 
a shift from static cognitive abilities to à [Yourman, 1964, p. 109] should militate 
more process-oriented assessment is as effec- against widespread use with black children. 
tive a predictor of achievement as the IQ. Perhaps most intriguing in this study was 
In fact, it accounted for approximately twice the results of the learning potential strategy 
the variability in arithmetie as the IQ mea- used in this context. The Raven $ CPM has 

^ sure. been referred a er test which ee 
The identification of a possible alternative tg” or genera intelligence m nearly pure 
predietive measure has DM implications form" (Jensen, 1969) since it presumably 
to the diseussion concerning intelligence test- ized cultural and scholastic content. 
ing of low-SES children. If the IQ is ac- In the present study, it showed a weak rela- 
cepted as an accurate index of learning abil- tionship to pn me Ek us 
“ity, inferentially there would be a discerni- learning potentia egies sues i 


ble similarity in learning potential between its effectiveness in i 


> 
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ability to profit from instruction was clearly 
demonstrated in the posttest results follow- 
ing training. Whereas the initial testing on 
the Raven's CPM accounted for 6% of the 
variability in achievement, posttesting fol- 
lowing teaching of the relevant concepts to 
solve the problems accounted for approxi- 
mately 25%. Interestingly, the difference be- 
tween the pretest and posttest (labeled 
“Learning Potential Gain”) was equally ef- 
fective as the posttest in its predictive 
powers. The strategy did not exceed the IQ 
test in predictive potential, nevertheless, the 
procedure does lend itself to the prospect of 
better differentiating those who can profit 
from learning experience than the IQ mea- 
sure. 

Optimism regarding the prospect of 
paired-associate tasks adding meaningfully 
to the prediction of academic achievement, 
has inereased by a number of studies (Lam- 
bert, 1970; McCullers, 1965; Rohwer & 
Levin, 1971), Despite the significant cor- 
relation found between one condition of the 
paired-associate (instruction to elaborate) 
and reading (r = :248, p < .05), the present 
factor analysis and correlational data do not 
Support the findings of a substantial correla- 
tion between paired-associate learning tasks 
and achievement. In light of the generally 
low correlations, these findings offer support 
to Rohwer, Ammon, Suzuki, & Levin (1971) 
who argued the possibility “that paired as- 
Sociate tasks do not elicit the kinds of learn- 
Mg processes necessary for performance on 
School learning tasks [p. 13]." 

Because the patterns of predictive rela- 


which measure the actual learning process, 
factor analysis has contributed to the deter- 


strongly related to language, and very possi- 
bly reflects a verbal mediation skill, as op- 
posed to the rote learning skill reflected in 
the first factors, None of the subsequent fac- 
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tors seem very important when considering 
achievement. They add support for the com- 
plete independence, and possible insignifi- 
cance, of the paired-associate tasks when 
considering the kinds of achievement skills 
important for first-grade, low-SES black 
children. 

Although this study did not show the 
WISC test to be less predictive of a black 
sample than is commonly reported for white 
children, it did offer added information re- 
garding the value of a shift toward a cur- 
riculum-based assessment. It also under- 
scores the value of studying the child as he 
actually learns, as opposed to simply assess- 
ing what the child knows from past experi- 
ences. By extending these two qualities, 
observing the child under conditions of stan- 
dardized learning, and by selecting material 
from the actual body of material to be 
learned by the child in the curriculum, some 
of the limitations of the IQ test should be 
circumvented. The present procedure does 
not deal with the issue of which curriculum 
is the most appropriate for a given subeul- 
ture, but it raises vital questions concerning 
the practice of testing children on materials 
where the possibility of content irrelevance 
is considerable and where the historical ex- 
periences of the child have poorly equipped 
him to deal with standardized tests. 
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. LOCUS OF CONTROL AND APTITUDE TEST SCORES AS 
PREDICTORS OF ACADEMIC ACHIEVEMENT' 


WALTER R. NORD,’ FRANCIS CONNELLY, ano GEORGE DAIGNAULT 


: Washington University 


The experimenters studied the relationships among academic achieve- 
ment in graduate school, perceived locus of control, and a relevant ap- 
titude test. Grades in 15 individual courses and overall grade point 
average (GPA) were used as the criteria of academic success. Three 
LE scores (total scores and ideological and personal control subscores) 
and the Admissions Test for Graduate Study in Business (ATGSB) 
scores of 50 entering MBA students were used as predictors. It was 
found that both the ATGSB and I-E scales accounted for significant 
and often complementary portions of variance in GPA and grades 
in individual courses. Each instrument predicted better in certain 
courses and the predictive power of each instrument tended to re- 
main when influence of the other was controlled statistically. 
Analysis of achievement as a function of the two I-E dimensions re- 
vealed that personal control tended to be a better predictor of 
achievement than was ideological control. It was concluded that the 
locus of control might be a useful predictor of academic success, but 
that no simple pattern exists. Rather, course content and such factors 
as teacher behavior may interact with perceived locus of control to 


determine academic achievement. 


The degree to which perceived locus of 
control can be used to predict academic 
achievement has been given increased atten- 
tion in the last few years. Comprehensive 
reviews of the literature (Lefcourt, 1966; 
Rotter, 1966) reported some support for 
Rotter's I-E scale as a predictor of aca- 
demic achievement at the precollege level, 
but little evidence concerning its usefulness 
for college or graduate school populations. 

3 Recently several Investigators have stud- 
led the relationship of perceived locus of 
control and performance in college. Eisen- 
man and Platt (1968) found no relationship 
between I-E scores and grades in psychol- 
ogy classes for 131 Students, mostly fresh- 
men, at the University of Georgia. Ware- 
hime (1972) using freshman subjects at the 
University of Iowa, found that the LE scale 
had no practical use as a predictor of first- 
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year grades above what could be accounted 
for by IQ alone. Massari and Rosenblum 
(1972) examined the relationship between 
the locus of control and academic achieve- 
ment in college students as measured by a 
multiple-choice final exam for 43 female 
and 90 male introductory psychologists at 
Temple University. In addition to Rotter's 
I-E scale, these researchers used a re- 
vised version of the Crandall Intellectual 
Achievement, Responsibility (IAR) scale 
and Rotter's Interpersonal Trust scale. 
They reported that neither the I-E nor the 
IAR seale were significantly correlated with 
academic achievement for men. Moreover 
for women, the correlations were signifi- 
cant but opposite from the predicted direc- 
tion. In other words, the external women got 
higher grades than internal women. Also, 
the authors found no relationship between 
the Scholastic Aptitude Test and the IAR 
or the I-E scales. Hjelle (1970), using in- 
troductory Psychology students at Villa- 
nova University as subjects, reported “mar- 
ginal support” for the hypothesis that 
internal students Eot better grades than 
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external students (x? = 298, p < 35). 
Eilersen (1972) using 116 community col- 
lege students enrolled in introductory psy- 
chology found that internality was predic- 
tive of achievement and participation in 
unstructured (as opposed to structured) 
classes. Thurber (1972) failed to find sup- 
port for the hypotheses that the externally 


‘oriented subjects would perform more ade- 


quately on examinations based on classroom 
lectures and assigned reading material and 
that internal subjects would show greater 
relative achievement on more broadly de- 
fined examinations based on information 
which was available but not specifically 
assigned or discussed in a classroom. 

While most of the studies summarized so 
far have given little support to the useful- 
ness of the I-E scale as a predictor of aca- 
demie success in college, several more re- 
cently published studies have reported 
stronger relationships. Brown and Strick- 
land (1972), using introductory psychology 
students as subjects, found internality was 
associated with high grades for males but 
not for females. Similar results were re- 
ported by Boor (1973) ina study published 
after our analysis had been completed. Boor 
found that with intelligence partialed out, 
total I-E scores and Mirels (1970) Factor 
II or the political locus of control subscale 
of the I-E were significantly related (r — 
= 23 nt M respectively) to examina- 
tion scores in an introductory psychology 
course for males; the same statistics cal- 
culated on data from female subjects were 
not significant. Finally, using college stu- 
dents as subjects, Gozali, Cleary, Walster, 
and Gozali (1973) reported that internality 
was associated with functional behavior on 
achievement tests. eid 

While the findings are mixed, existing 
evidence seems to indicate that locus of con- 
trol, at least jn some circumstances, may be 
a useful predictor of academic success in 
college for males. However, most of the 
studies were based on introductory psychol- 
ogy students and often the criterion has 
been limited to either grades in introductory 
psychology or to overall grade point aver- 
llege freshmen. The relationships 


ages of co 
locus of control and per- 


between perceived 
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formance at levels beyond the freshman 
year and in specific courses other than psy- 
chology have not been studied. 

The present study investigated the I-E 
scale as a predictor of academic success in 
a graduate business school. In addition to 
studying a different population of students 
in a different environment than previous re- 
search, this study sought to examine the I-E 
scale as a predictor for performance in var- 
ious courses as well as to compare its predic- 
tive power with a widely used aptitude test, 
the Admissions Test for Graduate Study in 
Business (ATGSB). Specifically, the study 
sought to determine if a personality variable 
such as the locus of control could improve 
the prediction of overall grade point average 
(GPA) over and above predictions that 
could be made by using à standard aptitude 
test (ATGSB). Second, the researchers 
sought to examine the predictive power of 
the I-E seale and the ATGSB for individual 
courses. Finally, the study explored the 
ideological and personal control subscales 
of the I-E (Gurin, Gurin, Lao, & Beattie, 
1969) as predictors of grades. 


METHOD 


Subjects 


Full-time students in the introductory Organi- 
zational Behavior course, in the Graduate School 
of Business at Washington University served as 
subjects. Students whoge native language was not 
English and students who did not complete the 
first year of the program were excluded from the 
analysis, leaving 48 male and 2 female students who 


produced usable data. 


Procedure 


Students were asked to complete the I-E scale 
during the first class session of Organizational Be- 
havior in the fall of 1971. They were told that 
participation was voluntary, that the results of the 
study would not be examined until they had com- 
pleted the first year, and that the individual results 
would be made available to them during the sec- 
ond year of the program. The I-E scale was scored 
using Rotter's (1966) scoring key. The subscale 
scores were obtained using the scoring system of 
Gurin et al. (1969). Data on the ATGSB and 
course performance were collected from the school 
files. 

The data were analyzed through correlational 
techniques. Pearson product-moment and multiple 
correlations were calculated to predict individual 
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course grades as a function of ATGSB and the I-E 
total scores. In addition, partial correlation coeffi- 
cients were caleulated for individual course grades 
as a function of the I-E, holding ATGSB constant 
and as a function of the ATGSB, holding I-E score 
constant. The same set of correlations was calcu- 
lated for GPA as a function of ATGSB and I-E 
scores. Finally, the correlations of the ideological 
control and personal control subscales with individ- 
ual course grades, with ATGSB scores removed 
statistically, were calculated. In all cases one-tailed 
tests were used to assess the significance of the cor- 
relation coefficients. 


RESULTS 


The raw, multiple and partial correlations 
of the ATGSB and I-E scores with GPA 
and grades in individual courses are shown 
in Table 1. The bottom line in the table 
shows that both ATGSB and I-E scores 
were significantly related to GPA (r = .38, 
p < .05;r = —.32, p < .05, respectively). In 
other words, students who had high ATGSB 
scores and those who tended to be internal 
achieved higher grade point averages than 
students who had low ATGSB scores and 
those who tend to be more external. While 
the ATGSB and I-E scores were slightly 
but not significantly correlated (r = —.18, 
p < .05), when I-E scores were partialed 
out, the ATGSB scores were still signifi- 
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cantly correlated with GPA (r = 34, p < 
.05); when ATGSB scores were removed 
statistically, the correlation of I-E and 
GPA remained significant (r = —.27, p < 
05). Moreover, as the significant R? indi- 
cates, these two measures taken together ac- 
count for over 20% of the variance in GPA, 
as compared to 14% for the ATGSB taken 
alone and 10% for the I-E taken alone. 
Thus, while the ATGSB was a slightly bet- 
ter predictor of GPA than the I-E, the two 
taken together produced better predictions 
than either taken alone. 

The correlations of the ATGSB and I-E 
scores with grades in individual courses are 
also presented in Table 1. The simple corre- 
lation coefficients followed a pattern similar 
to that of overall grade point averages. 
Grades in most of the courses were posi- 
tively correlated with the ATGSB scores 
and negatively correlated with I-E scores. 
While the size of the correlations was not 
large for either of the instruments, on the 
average the ATGSB appeared to slightly 
outpredict the I-E scale. Twelve of the 15 
correlations between ATGSB scores and 
course grades were positive; 8 were statis- 
tically significant. Fourteen of the 15 cor- 
relations between I-E scores and course 


TABLE 1 


CORRELATIONS OF I-E, ADMISSIONS TEST FOR GRADUATE STUDY IN BUSINESS (ATGSB), 
AND GRADES 


m R? R) partial r 
Li ATGSB LE ATGSB+I-E ATGSB LE 
Managerial accounting 
Financial accounting jn ve n. M. (.57) 46 — 36° 
Economies (micro) 48 22 22 f EA SE Iio 
Economics (macro) 49 a2 ES in s E) dn ae 
el ss programming 49 .24* -a2 ‘06 qu E g ir 
ing = 1 à AAT. ; ; 
Organizational behavior á n mu A Wry n4 
Quantitative courses j WE EE MID E nite 
Ginenr programming 50 .34* —.29* 17* (.41) 31* —.25* 
MR theory a ph ae E vu ves 5 e 
ERE US d zu —.01 ‘04 C20) M » i 
Management simulation 49 = #5 R X ri * [r^ 
p information systems | 47 <00 x i a io are T 19 
roduetion i i : D E A 
50 
Overall grade point average 50 S d E T5 t E * 
. eL : E .94* m 


* p < .05, one-tailed tests. 
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grades were negative, 6 being statistically 
significant. The patterns of the partial cor- 
relations were similar, with the ATGSB 
being signifieantly positively related to 
grades in five courses and the I-E being sig- 
nificantly negatively related to grades in 
five courses, The ATGSB outpredicted the 
I-E in the quantitative, the financial, and 
one of the accounting courses. The I-E pre- 
dicted well for both accounting courses, but 
had its greatest power for predicting grades 
in marketing and production. 

Seven of the multiple correlations using 
ATGSB and I-E scores as predictors of 
course grades were significant. In four 
courses these two predictors accounted for 
over 20% of the variance in grades. 

In general, the data on individual course 
grades supported the expectations that both 
instruments would be predictive of grades. 
However, both predictors seemed to be more 
successful in some courses than others. 
Moreover, both instruments accounted for 
complementary portions of the variance in 
grades in many of the courses. 

Finally, the data were examined to deter- 
mine if the subscales of ideological and per- 
sonal control were related to grades in dif- 
ferent courses. Table 2 shows the correla- 
tions between grades and I-E scores with 
ATGSB scores partialed out. Ten of the 15 
grades in individual courses Were negatively 
related to ideological control scores; how- 
ever, only the correlations for marketing 
(r = —44,p € 05) and production (r — 
—.38, p < .05) were statistically significant. 
Thirteen of the 15 correlations between 
personal control scores and course grades 


were negative; 5 of these were statistically 


significant. Thus it appeared that there was 


a tendency on both ideological and personal 
control dimensions for internal subjects to 
achieve higher grades. However, the per- 
sonal control dimension seemed to be à 
somewhat better predictor of grades than 
the ideological subscale. The correlations 
between the two subscales and GPA are 
shown on the bottom line of Table 2. While 
internal subje 
to achieve higher GPAs, 
between personal 
nificant (r = —.27, P 
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TABLE 2 
PARTIAL CoRRELATIONS or I-E SUBSCALES WITH 
Grape Point AVERAGE AND GRADES IN 
InprvipuaL Courses CONTROLLING FOR 
ApMissioNs TEST FOR GRADUATE Srupy 
in Business SCORES 


Partial correlations 
with I-E subscales 
Course s (| 
Ideological| Personal 
control | control 
Managerial accounting 50 | —.18 | —.38* 
Financial accounting 46 | —.15 | —.28* 
Economies (micro) 48 .08 | —.06 
Economies (macro) 49| —.03 | —.12 
Computer programming 49 .09 | —.08 
Marketing 49 | —.44* | —.50* 
Organizational behavior |48| —21 |- A2 
Quantitative courses 
Linear programming 50 | —.08 | — .24* 
Calculus 46 10 | —.07 
Probability theory 46| —.16 | —.13 
Statistics 40 21 18 
Finance 46 15 07 
Management simulation 49.| —.19- | —.18 
Management information | 47 | — 45 | —.16 
systems 
Production 50 | —.38* | —.33* 
Overall grade point average| 90 | — a | —.27* 


*p < 05. 


personal control subscale appeared to be the 
better predictor of academic performance. 


DISCUSSION 


In interpreting the results of this study, 
several limitations must be considered. 
First, almost all of the students were males. 
Second, it must be remembered that the 
students had already entered graduate busi- 
ness school. Consequently, the predictive 
power of the ATGSB is apt to be under- 
stated in comparison to what it would be for 
the population of applicants to business 
school. Also, the students were to some 
degree selected on other characteristics such 
as undergraduate grades and recommenda- 
tions. To the degree (as yet unknown) that 
these measures are & function of locus of 
control, the predictive power of the I-E 
scale may be understated. Finally, grades 
are only one criterion of academic success 
and are rife with problems of reliability and 
validity. 

While these limitations mean that no firm 
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Statements can be made about the relative 
predictive power of the two instruments for 
selection, the data did yield some poten- 
tially useful results. First, the study con- 
firmed previous research (Connelly & Nord, 
1972) as to the magnitude of the correlation 
of the ATGSB with grades in MBA pro- 
grams. In addition, despite a small insignifi- 
cant correlation between I-E scores and the 
A'TGSB, the predictive power of both scales 
tended to remain reasonably strong even 
when the influence of the other was par- 
tialed out. In other words, the ATGSB and 
the I-E scale, to some degree, measured dif- 
ferent dimensions. However, both dimen- 
sions were related to success in graduate 
school. While the absolute magnitudes of the 
correlations for both instruments were low, 
they were large enough to have some prac- 
tical use. Moreover, since the predictive 
power of the instruments complemented 
each other, admissions decisions of profes- 
sional schools could be improved by a rela- 
tively inexpensive personality measure, such 
as locus of control, in addition to the stand- 
ard aptitude tests. Overall, these results 
added to the growing number of studies 
which have found a positive relationship be- 
tween internal locus of control and personal 
achievement. 

Perhaps the most interesting findings 
were the differences between the power of 
two instruments to predict grades in differ- 
ent courses. The ATGSB predicted better in 
the quantitative courses and to some degree 
in accounting. The I-E scale was more pre- 
dictive in marketing; also it predicted well 
in accounting, production, and even one 
quantitative course. Quite probably a vari- 
ety of factors including subject matter, 
organizational climate, and the behavior, 
personality, and grading policies of profes- 
sors influence which student characteristics 
are associated with success. Some support 
for this suggestion comes from Runyon’s 
(1973) study. Runyon found that satisfac- 
tion with participative and directive super- 
visory styles in industry was related to the 
perceived locus of control of the subordi- 
nates. Similar findings for satisfaction with 
teaching style might be expected. 

The results of the present study may also 


W. R. NORD, F. CONNELLY, AND G. DIAGNAULT 


help to explain the confiicting findings ^^ 


among previous studies of the relationship 
between locus of control and grades. For the 
most part, each of these studies was based 
on only one course (aften introductory 
psychology), grades assigned by only one or 
a very few teachers, and one criterion such 
as an objective test. The data from the cur- 
rent study suggest that the I-E is a better 
predictor in some courses than in others. A 
search for factors which influence the degree 
to which the I-E scale predicts academic 
success appears warranted. 

Finally, the current data were consistent 
with the view that the I-E scale is not uni- 
dimensional. For the most part, personal 
control was a better predictor of achieve- 
ment than ideological control. However, the 
direction of this relationship was contrary 
to Boor’s (1973) finding that political con- 
trol was predictive of grades but personal 
control was not. Two differences between 
Boor’s and the current study might be 
responsible for these contradictory findings. 
First, Boor’s subscales were scored using 
Mirels', (1970) system; the present study 
employed Gurin et al.’s (1969) scoring pro- 
cedure. Second, Boor's criterion was limited 
to grades in psychology; the present study 
used 15 courses as criteria. However, one of 
these courses, organizational behavior, in- 
cluded a large amount of psychological ma- 
terial. Grades in this course, as in Boor's 
study, tended to be negatively correlated 
with ideological control (r = .21, p < .10). 
Perhaps then, the relationship of the sub- 
scales of locus of control to academic 
achievement is a function of course content. 

In conelusion, it was not surprising that 
measures of aptitude and personality were 
predictive of academic achievement. How- 
ever, the nature of this relationship was 
more complex than previous researchers 
seem to have assumed; both the measures 
used in this study predicted better for some 
courses than for others. The fact that the 
LE scale was a better predictor of achieve- 
ment in some courses than was the ATGSB 
and the complementary nature of the two 
instruments for predicting grades suggest 
that admissions officers might find it useful 
to evaluate the value of personality tests. 


im 
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However, the results of the present study 
were based on data from only one school, 
used only grades in first-year courses as 
criteria, and left a large portion of the vari- 
ance in grades unexplained. 
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AN EXPLORATION OF TWO CORRECTION PROCEDURES 
USED IN MASTERY LEARNING APPROACHES 
TO INSTRUCTION: 


JAMES H. BLOCK? ann MICHAEL L. TIERNEY” 


University of California, Santa Barbara 


This study investigated the impact on college students’ grades, achieve- 
ment, and attitudes of the respective "correction" procedures used in 
Bloom- and Keller-type mastery learning strategies. Forty-four male 
and female students were taught European historiography using a 
2 X 3 factorial, Pretest/No Pretest X Bloom-Type Correction/Keller- 
Type Correction/No Correction design. The findings indicate that 
periodic correction can improve at least the students' ability to apply 
the course material, but only if it is accomplished as in Bloom's strat- 
egy. The findings also suggest that pretesting can increase students' 
ability to apply the course material and their attitudes toward history. 


All learning for mastery strategies share 
some basic ideas about how instruction 
should be designed (Block, 1971). Yet to 
date there has been virtually no research as 
to how these ideas might work singly or in 
concert to facilitate student learning (Block, 
1974). Accordingly, the present research at- 
tempted to take but one of these ideas and to 
begin to explore its potential impact on stu- 
dent learning. The idea selected was “cor- 
rection.” 

Each mastery learning strategy assumes 
that, if someone is having learning problems 
on one segment of instructional sequence, 
these problems must be cleared up before 
they impair future learning. Hence, each 
carefully monitors student learning over 
every segment of the instruction and period- 
ically remediates or corrects problems as 
they develop (Block, 1974) . 

But while each mastery learning strategy 
possesses some provision for correction, 
some strategies correct very differently than 


5 We are indebted to Robert M. Bortnick, Edwin 
M. Bridges, John W. Cotton, and Chester W. 
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others. This is especially the case with the 
two best known approaches to mastery 
learning: Keller’s (1968) Personalized Sys- 
tem of Instruction and Bloom’s (1968) 
Learning for Mastery Strategy. In Keller’s 
approach, correction is typically accom- 
plished by returning the student to the orig- 
inal instructional materials and methods for 
the segment upon which he is having learn- 
ing problems. In Bloom’s approach, it is 
usually accomplished by sending the student 
on to supplementary instructional materials 
and methods that teach the problematic 
subject matter in ways different from the 
original materials and methods. 

Accordingly, the present research ad- 
dressed two questions about the idea of 
correction. First, does correction have a 
positive impact on students’ learning? Sec- 
ond, does it matter how the correction is ac- 
complished? To begin to answer these ques- 
tions, the impact on college students’ learn- 
ing of correction as it is practiced in Keller’s 
and Bloom’s approaches was examined. 


METHOD 


Subjects 


Forty-four upper division college students en- 
rolled in a 10-week European historiography course 
were the subjects. Thirty of the students were 
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history majors and 27 had taken at least one other 
course from the instructor. Twenty-two of the stu- 
dents were males and 22 were females. 


Treatments 


Three instructional treatments were used to 
teach the course. The first treatment, the control 
treatment, was the traditional lecture/discussion 
approach. In this treatment, subjects attended 50- 
minute lectures three times a week during the 
quarter and read six required books. 

The second treatment, the redirected study 
treatment, used the traditional approach plus & 
Keller-type correction procedure. Once every two 
weeks, students in this treatment received a multi- 
ple-choice, diagnostic-progress or formative (Aira- 
sian, 1969) test on the readings and lectures for the 
preceding two-week period. At the next class section 
their tests were returned. Each student was told 
which items he had answered correctly, was pro- 
vided with the correct answers to missed items, and 
was given a prescription for learning the material 
tested by these missed items. This prescription indi- 
cated whether the items missed were “knowledge” 
items or “application” items (Bloom et al., 1956). 
And for each missed application item, it indicated 
which knowledge items covered related material. 
The prescription then directed the student to re- 
study and review the original reading materials and 
lecture notes for the previous two-week period. 

The third treatment, the small-group study 
treatment, used the traditional approach plus a 
Bloom-type correction procedure. Students in this 
treatment were given the same diagnostic-progress 
tests biweekly as the redirected study group, but 
they had their test results returned during co- 
operative small-group study sessions typical of 
Bloom-type mastery learning strategies (see Block, 
1971; Kim, 1971; Lee et al., 1971). 

In these sessions, groups of approximately four 
students would meet together one night a week for 
one hour, At the start of the session, each student 
was given his corrected diagnostic-progress test 
with the correct answers indicated. Then each stu- 
dent in turn was asked to select one of the questions 
that he had answered correctly and to explain in his 
own words why he had selected that answer. The 
other students were encouraged to ask questions or 
to justify why they felt that other possible answers 
might be correct, Cooperative rather than competi- 
tive discussion was stressed, and no premium was 
placed upon having to come to agreement with the 
answer that was indicated as the correct one. This 
procedure was followed until all of the items on the 
diagnostic-progress test had been discussed. —— 

It is important to note that due to the constraints 
of the instructional setting, in neither correction 
treatment could students be formally required to 
use their respective correctives. Hence & host of 
informal techniques had to be used. The techniques 
ranged from constant prodding, to asking the re- 
directed study students for periodic self-reports on 
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attempts to correct learning problems, to taking 
attendance at the small-group study sessions. While 
more formal procedures would have clearly been 
desirable, these informal procedures seemed to 
work quite well. 


Learning Measures 


There were three learning criterion measures 
used in the experiment. One measure was the stu- 
dent’s final letter grade in the course, based on his 
performance on a take-home, essay examination. 
Letter grades were converted to number grades 
according to the standard University of California 
procedure (A+ = 43; A = 40; A— = 37; B+ = 
3.3; etc.). The second measure was the student’s raw 
score on a 50-item multiple-choice achievement 
test. This test was composed of 60% knowledge 
items and 40% application items (Bloom et al., 
1956) and was constructed from a table of instruc- 
tional objectives developed jointly by the second 
author and the course instructor. The third measure 
was a 10-item Likert-type attitude scale adapted 
from the Attitude Toward Mathematics subscale 
reported in Husén (1967). 

Validity data neither were available nor could 
they be gathered on the grade or the attitude mea- 
sures. However, the content validity of the achieve- 
ment test was established by having the course in- 
structor review each item and by dropping and 
revising those items which he did not feel tapped 
his instructional objectives. 

Reliability data were also unavailable on the 
grade measure. The Kuder-Richardson (1937) For- 
mula 20 reliability estimate for the achievement 
test was .78 based upon the pretest scores and 69 
based upon the posttest scores, The two-day delay, 
test-retest reliability of the attitude measure was 
estimated to be .98. 


Design 


One third of the subjects were randomly assigned 
to each of the three instructional treatments, One 
half of the subjects in each treatment were then 
pretested with the achievement measure and the 
attitude scale so as to establish some of each instru- 
ment’s psychometric properties before using them 
further, The remaining subjects were not pretested, 
Accordingly, the resultant design for this study was 
a 2 X 3 factorial design: Pretest/No Pretest X No 
Correction/Keller - Type Correction/Bloom - Type 
Correction. 


Procedure 


On Day 1 of the study, the second author ad- 
dressed the students and sought their consent to 
participate in the experiment. He then randomly 
assigned the students to the three experimental 
treatments. The students in each group were told 
that they would be treated differently, but that how 
they were treated would not affect their chances of 
earning a high grade. Then one half of the students 
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TABLE 1 2 
8, AND 
(D STANDARD DEVIATIONS FOR STUDENT ACHIEVEMENT, GRADES, 
Ot e ATTITUDE TowarD History 
Criterion measures 
Achievement treatment Grades treatment Attitude toward history treatment 
Group 
i l- ed 
Smatlgroup]| Redirected | Control |Smalbgeun| Redirected | Control |Smalkgzoup]| Redirected | Contro 
wig 30.67 25.00 28.88 32.55 31.62 | 30.88 | 13.50 12.88 | 14.88 
SD 3.39 6.46 4.49 3.20 3.02 5.59 1.87 4.94 1.55 
(n =6) | (n =8) | (n=8) 
ti 
eat: a 27.57 24.33 24.83 28.50 29.33 | 33.33 | 11.71 11.00 10.83 
SD 4.28 4.21 6.55 3.68 6.60 3.14 4.82 3.90 2.99 
(n27)| n29)| (n=6) 


in each treatment, were randomly selected and pre- 
tested with the achievement test and the affective 
scale. 

From Day 1 on, the instructor taught the course 
as he had always taught it. Biweekly, he allowed 
the second author to administer the appropriate 
diagnostic-progress test to the redirected study and 
the small-group study students, Control group 
students were allowed to leave class early on these 
occasions. At the end of the ninth week, all students 
were administered the multiple-choice achievement. 
test and the attitude scale without any prior notice 
so as to minimize the effects of last-minute cram- 
ming. They then submitted their essay, take-home 
examinations at the end of the 10th week. These 
exams were graded by the instructor, and final 
grades were assigned by the end of the 11th week. 


Data Analysis 


Originally, 10 students were assigned to each of 
the six cells in the experimental design. Unfor- 
tunately a number of students subsequently 
dropped the course, Hence complete data were ob- 
tained from only 44 subjects. A 2 x 3 Pearson x? 
test of association suggested that the subject mor- 
tality was essentially random ( Winer, 1971). So 
these data were then analyzed as if they had come 
from a 2 X 3 factorial design with unequal cell sizes. 

Fixed effects, nonorthogonal, univariate analysis 
of variance procedures (Bock, in press) were used 
in the data analysis, Each analysis of variance was 
performed in two ways since in a nonorthogonal. 
factorial analysis of variance it matters whether one 
tests for one main effect before another. In the first 
way, the Pretest X Correction interaction was 
tested, then the pretest main effect, and finally the 
correction main effect. In the second way, the Pre- 
test X Correction interaction was tested, then the 
correction main effect, and finally the pretest main 
effect. When a significant, main. effect for correction 
was found on a particular learning criterion mea- 


sure in either analysis of variance, the least-square 
estimated effects of each correction treatment 
relative to the control treatment were compared. 


RESULTS 


Table 1 reports the within-cell means and 
standard deviation for the scores on the 
achievement, grade, and attitude measures. 
The analyses of variance based on these 
descriptive statistics indicated that the 
pretesting had a significant effect on student 
attitude toward history (F = 5.18, 5.49, 
df = 1/38, p < .05).4 The analyses of vari- 
ance also suggested that the correction had 
an effect on students’ total achievement test 
scores that approached significance (F = 
2.84, 2.82, df = 2/38, p < .07). But inspec- 
tion of the least-square estimated effects of 
each correction treatment suggest that per- 
haps one approach to correction had had a 
very different effect on student achievement 
than the other. The small-group study stu- 
dents achieved an estimated 2.26 more items 
correct than the control students, while the 
redirected study students achieved 2.19 
fewer items correct than the control stu- 
dents. 


Hence to explore in greater detail the 


“Here and hereafter we report the two Fs ob- 
tained by caleulating each analysis of variance in 
two ways. The first F corresponds to the P cal- 
culated in testing the pretest main effect before the 
correction main effect. The second F corresponds to 
the F calculated in testing the correction main 
effect before the pretest main effect. 
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different effects that.the two correction 
treatments had seemed to have on student 
achievement, it was decided to split each 
student's total achievement test score into 
his subscores on the knowledge items and on 
the application items. Two checks were then 
made to determine whether this split made 
sense, that is, whether the application and 
knowledge items really tapped different as- 
pects of student achievement. 

In the first of these checks, a Pearson prod- 
uct-moment correlation was calculated be- 
tween the students’ subscores on the appli- 
cation items and their subscores on the 
knowledge items. If the application and 
knowledge items tapped different dimen- 
sions of student achievement, then the 
linear correlation between scores on one set 
of items and scores on the other set should 
have been low. The correlation obtained 
was moderately low (r = .36) and only 
marginally different from zero (p < .05). 

In the second check, Guttman scalogram 
analysis (Edwards, 1957) was performed on 
the knowledge and application items to 
determine whether knowledge of the course 
material was necessary, but not sufficient, 
for its application, as the work of Bloom et 
al. (1956) would suggest. Median subscores 
on the knowledge and on application items 
were calculated. Each student was then 
given a score of 1 or 0 depending upon 
whether his application subscore was above 
or below the median application subscore, 
and another score of 1 or 0 depending upon 
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whether his knowledge subscore was above or 
below the median knowledge subscore. Next 
the Guttman coefficient. of reproducibility 
was calculated across students under the as- 
sumption that each student who scored 0 
on the knowledge items and 1 on the applica- 
tion items disconfirmed the hypothesis that 
knowledge of the course material was nec- 
essary, but not sufficient, for its application. 

The subscores of only seven students ex- 
hibited the 0, 1 pattern; the coefficient of 
reproducibility was .84. This coefficient is 
slightly lower than the minimum level of .90 
that Guttman has set out as being accepta- 
ble. But as Oppenheim (1966) has noted, .90 
may be too stringent a criterion of reproduc- 
ibility. Accordingly, the coefficient of .84 
may reasonably be taken to indicate that 
the subscores on the knowledge and applica- 
tion items tended to form at least a weak 
Guttman scale. 

Having established that it did make some 
sense to parse each student’s total achieve- 
ment test score into knowledge and applica- 
tion subscores, 2 X 3 nonorthogonal analy- 
ses of variance were then performed on each 
set of subscores to examine the effects of the 
pretesting and the correction. Table 2 re- 
ports the cell means and standard deviations 
of student subscores upon which these anal- 
yses of variance were based. 

The analyses of variance indicated there 
was neither a Pretest x Correction interac- 
tion effect for subscores on the knowledge 
items nor for subscores on the application 


TABLE 2 


CELL MEANS AND STANDARD DEVIATIONS 


FOR KNOWLEDGE AND APPLICATION SUBSCORES 


Criterion measures 
Knowledge Application 
Group treatment group treatment group 
Small-group po Control FORCES E Control 
MENS 16.17 14.50 16.25 14.50 10.50 12.62 
ES 2.79 4.14 2.82 2.43 2.78 2.39 
(n = 6) (n = 8) (n = 8) 
No pretesting 12.57 9:78 jus) 
Y 14.56 14.83 i 
si 10 3.71 4.58 2.15 1.79 2.53 
p (2D | m=9) | 0-9 
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items. But there were significant main ef- 
fects for both the pretesting (F = 5.59, 5.36, 
df — 1/38, p « .05) and the correction (F = 
7.50, 7.61, df = 2/38, p < .01) on the appli- 
cation items, Examination’ of the least- 
square estimated effects of each correction 
treatment indicated that only students who 
had received the small-group study treat- 
ment answered significantly more applica- 
tion items correctly than the students who 
had received no correction at all (t = 2.47, 
p « .05). The small-group study students 
had answered correctly an estimated 2.22 
more application items than the control 
students, while the redirected study students 
had answered correctly an estimated 1.17 
fewer applieation items than the control 
group students. 

In sum, then, if learning was indexed in 
terms of the student's knowledge of the ma- 
terial taught, their final course grades, and 
their attitudes toward history, then stu- 
dents who received correction periodically 
throughout the course did not learn more 
than students who received no correction. 
This was true regardless of whether correc- 
tion was accomplished by returning the stu- 
dent to the original instruction (redirected 
study) or by sending him to different in- 
struction (small-group study). But if learn- 
ing was indexed in terms of the students’ 
ability to apply the material taught, then 
students who received correction did learn 
more than students who received no correc- 
tion provided that correction was accom- 
plished by sending the student to different 
instruction (i.e., small-group study) 


Discussion 


Two sets of findings have emerged from 
this research. One concerns the effects of 
correction on college students’ learning. The 
other concerns the effects of pretesting. We 
will not diseuss this latter set of findings 
here for such a diseussion would be periph- 
eral to our purpose. Clearly, the effects of 
pretesting on learning is a question that 
bears closer attention in future mastery 
learning research, 

How then can the correction findings be 
explained? The most obvious explanation 


JAMES BLOCK AND MICHAEL TIERNEY 


is that correction can have a positive impact 
on certain dimensions of student learning, 
but only if it is accomplished in a particular 
way. In particular, this study suggested that 
application of the course material could be 
significantly improved if students used a 
correction procedure which exposed them to 
supplementary instructional methods and 
materials rather than a one which required 
them to review and practice the original 
methods and materials. 

It is hoped that future research would test 
this explanation using a wider variety of 
correction procedures and a number of 
dimensions of student learning. Such re- 
search might eventually make it possible to 
select correction procedures which, in com- 
bination with ordinary group-based instruc- 
tion, are most powerful for promoting cer- 
tain desired dimensions of student learning. 
In fact, it might even be combined with the 
work of Cronbach and Snow (in press) on 
aptitude-instructional treatment interac- 
tions to make it possible to select correction 
procedures which are most appropriate for 
a given set of learning outcomes as well as a 
given set of learners. 

The major problem with such research is 
control for whether or not students use their 
correctives. If some treatment groups use 
their correctives, while others do not, then 
differences between treatment groups may 
be due to differential usage of correction 
procedures rather than to the type of cor- 
rection used. In this study, we were forced 
to use informal procedures so as to ensure 
that the students in both groups were using 
their correctives. Clearly, more formal pro- 
cedures, consonant with the classroom situa- 
tion, must be developed. We are currently 
exploring several such procedures for use 
in a replication of this study. 
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PERFORMANCE IN A PERSONALIZED 
INSTRUCTION COURSE 


GEORGE J. ALLEN,’ LAURA GIAT, AND ROGER J, CHERNEY 


University of Connecticut 


The experimenters studied the effects of locus of control and trait and 
state test anxiety on final grades, short answer and essay examination 
performances, and other academic outcomes obtained from 51 females 
and 37 males enrolled in a Keller-type “personalized instruction” 
class. The course format emphasized student control over the rate 
at which participants mastered self-selected areas of instructional 
material through proctored oral examinations. Locus of control and 
trait anxiety were assessed during the first class, while state anxiety 
was measured immediately before every oral examination. Results 
indicate that students possessing an external locus of control contracted 
for and ultimately earned lower grades, began working more slowly, 
reported more state anxiety during oral assessments, and performed 
more poorly on a written final examination than their more internally 
oriented peers. Trait test anxiety was not reliably related to the aca- 
demic outcome measures. However, a steady and significant reduction 
in self-reported state anxiety during oral examinations throughout the 
semester was found. Suggestions for “tailoring” course formats to 
specific student characteristics so as to improve academic performance 
were discussed. 


A number of investigators have proposed 
operantly derived “personalized” instruc- 
tion courses as alternatives to the more 
traditional lecture-examination format of 
university education (Born, Gledhill, & 
Davis, 1972; Keller, 1966). Most versions 
of personalized courses contain common 
elements which include (a) a priori specifi- 
cation of criteria for acceptable student 
performance on small segments of course 
material, (b) assessments which are non- 
punitive and remediable, and (c) provision 
for individual variation in work rate 
(Johnston & Pennypacker, 1971). The as- 
sumption that students show enthusiasm 
for such courses and demonstrate increased 
learning as a result of participation has 


*Data analysis was facilitated by National Sci- 
ence Foundation Grant GJ-9 to the University of 
Connecticut Computer Center, 

* Requests for reprints should be sent to G 
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sity of Connecticut, Storrs, Connecticut 06268. 


been empirically supported (McMichael & 
Corey, 1969; Sheppard & McDermot, 1970). 
Most researchers have used an “engineer- 
ing approach” in investigating the useful- 
ness of personalized formats; that is, ef- 
fectiveness of contingency manipulations is 
determined by repeatedly measuring be- 
havioral changes of entire classes. Individ- 
ual performance variability due to stable 
preexisting personality differences is treated 
as error variance. ' 
The purpose of this investigation was to 
examine the behaviors of students differing 
on locus of control (Rotter, 1966) and test 
anxiety (Sarason, 1957) in a personalized 
course which was Specifically designed to 
maximize students’ control over their aca- 
demie performance, Since perceived internal 
control is associated with superior use of 
personally relevant information (Phares, 
1968), it was predicted that internally ori- 
ented students would (a) begin fulfilling 
course requirements more quickly, (b) earn 
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higher course grades, and (c) predict their 
course performance more accurately than 
students with an external locus of control. 
As anxiety apparently hinders performance 
on complex cognitive tasks (Spielberger, 
1966), highly anxious students would be ex- 
pected to be more likely to (a) inaccurately 
predict their final course grades, (b) avoid 
taking test assessments for longer periods of 
time, and (c) achieve lower grades than less 
anxious individuals. In addition, highly anx- 
ious subjects may well manifest a greater 
frequency of task-irrelevant behaviors dur- 
ing oral examinations, thus interfering with 
concentration on the test questions them- 
selves. The effect of prior academic per- 
formance on functioning in the course was 
also examined. 


MzrHOD 


Subjects 


Of 101 students initially enrolled in an abnor- 
mal psychology course taught by the first author, 
90 completed all requirements and psychometric 
test data were collected from 51 females and 37 
males. These 88 subjects had obtained a mean pre- 
vious semester grade point average (GPA) of 323, 
with a majority of the students being classified as 
seniors (54%) or juniors (40%). 


Instructional Format 


The course was designed to provide students 
with an opportunity to demonstrate knowledge of 
specific content areas at individually convenient pe- 
riods throughout the semester. Knowledge of mate- 
rial presented during optional lectures was not as- 
sessed for purposes of fulfilling contracts for a 
grade. During the first class, each student received 
a manual specifying course procedures. After read- 
ing the manual, students returned signed contracts 
which specified the grade each planned to earn. 

Students also received a 77-page booklet con- 
sisting of 19 self-contained programmed instruc- 
tional packages (PIPs). Each PIP contained refer- 
ences to recent journal articles in specified topical 
areas (eg. mental retardation, desensitization, 
childhood autism). The articles were designated as 
either required or suggested reading. Three ques- 
tion sets were also included: the first tapped 
factual aspects of each required reading ; the sec- 
ond set required integration of specific experimen- 
tal findings into general conclusions; the third set 
assessed knowledge of both factual and implicative 
aspects of the suggested readings. Each PIP covered 
approximately 92 pages of required and 81 pages 


of suggested readings. 


After reading material for a particular PIP, the 
student contacted 1 of 11 undergraduate proctors 
(all of whom had earned an A in a previously 
taught course) to take an oral examination. During 
this assessment, the student answered a proctor- 
selected sample of at least four factual and all 
integrative questions, An acceptable performance 
on this portion of the oral earned the student 4 
points. Answering questions pertaining to suggested 
readings earned an additional 2 points. Following 
the oral, the proctor recorded the duration of the 
student's speaking time and rated his performance 
on a scale from 1 (poor) to 5 (excellent). The stu- 
dent completed an identical but independent rating 
of his own proficiency and estimated the time he 
spent reading articles and reviewing for the oral. 

To reinforce rapid completion of assessments, 2 
bonus points were awarded for successfully passing 
an oral during the first five weeks of the semester, 
and 1 bonus point was added for each PIP com- 
pleted during the second five weeks. The conver- 
sion of points to grades was as follows: A = 30; 
B = 2229; C = 15-21; D = 8-14; and F = 7. 
Thus, students could earn an equivalent number 
of points by taking one less oral if they completed 
assessments early in the semester. An average of 
238 orals were completed by the class (range — 
0-7) during the semester. 


Personality and Predictor Variables 


Six potential predictors of academic success were 
employed. During the first class, each student com- 
pleted measures of locus of control (Rotter, 1966), 
trait test anxiety( Sarason, 1957), and a state anxi- 
ety differential (Husek & Alexander, 1963). The 
latter measure was also administered by the proc- 
tors immediately before a student took an oral, as 
it is sensitive to anxiety fluctuations as a result of 
examination stress (Allen, 1970). Previous semester 
GPAs and Verbal Scholastie Aptitude Test (SAT) 
scores of all class members were also obtained. 


Academic Outcome Measures 


The grade originally contracted for was used as 
an index of expectancy for success (A — 4, B — 3, 
ete.) and was compared with the final grade which 
was actually earned. Another performance criterion 
involved determining the latency between the first 
day of class and the day each student passed his 
first oral. Latency was computed on the basis of a 
6-day week, as no orals were given on Sundays. 

During the final class, a “knowledge of abnor- 
mal psychology” test was administered. The ex- 
amination consisted of 45 multiple-choice questions 
drawn from instructor manuals of three abnormal 
psychology texts by three professors who taught 
sections of this course the previous semester. Four 
short essay questions worth 55 points were also 
provided by the three instructors. Each essay con- 
tained two or three sections which were graded ac- 
cording to strictly defined criteria on a 5-point 
scale by two independent raters. Interscorer relia- 
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bility of total essay scores was quite high (r — 96), 
and scores of both raters were averaged to yield 
data for further analysis. This test was designed to 
assess general knowledge of important concepts 
commonly taught in abnormal psychology courses, 
rather than retention of the greatly diversified 
material contained in the PIPs. Students were in- 
formed that their scores on this test would not 
affect their grade, but were urged to do their best. 

Student and proctor ratings of oral performance, 
number of minutes of student talking time during 
the assessment, and the number of hours the stu- 
dent reported studying for the oral were also used 
to assess outcome. 


RESULTS 


Since the students completed a variable 
number of oral exams, all data collected 
during these occasions were averaged for 
each subject. This procedure also minimized 
intraindividual variability as a result of 
contact with different proctors and subject 
matter sources at various times throughout 
the semester, 


Locus of Control and Academic 
Performance 


Of 14 correlations computed between 
locus of control and the other measures, 6 
were significant at the .01 level. These data 
indicated that an external orientation was 
positively associated with state anxiety dur- 
ing the first class (r = .31) and during oral 
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examinations (.35), as well as negatively 
related to performance on the essay portion 
of the written examination (—.49). Students 
having an external perception also tended 
to begin completing orals later in the semes- 
ter (.43), initially contracted for lower 
grades (—.32), and ultimately obtained 
poorer marks (—.46). 

Second-order partial correlations be- 
tween locus of control and several outcome 
variables were computed in order to parti- 
tion out the influence of intellectual ability 
as measured by GPA and SAT. The simul- 
taneous removal of the effects exerted by 
these variables did not significantly alter 
the magnitude of the previously reported 
relationships between externality and either 
course grade (rj = —.45) or oral examina- 
tion anxiety (ry = .35). Partialing out the 
effect of latency attenuated the relationship 
between loeus of control and final grade 
(rp = .31), although the association re- 
mained significant. 

Loeus of control scores were divided at 
the median and the means for the resultant 
groups of “internals” and “externals” on all 
other variables were computed. Table 1 pre- 
sents these data, as well as means for me- 
dian-splits on trait test anxiety and previous 
semester grades. Results of multiple t 
tests, which are also contained in Table 1, 


TABLE 1 


PERSONALITY bore AND ACADEMIC PERFORMANCE Means FROM Mepian-Spuits OF Locus 
or CONTROL, Test ANxiETY, AND Previous SEMESTER GRADES 


M Internal | External i ; T 3 
leasure [ey ESTER Low nr Hish gaiety EPA High GPA 
Locus of control 
n I. = 11.28 2. 
State aaah in frat o Tas | se | =S | m | Tm | en 
st class 55.81 | 62.36* j 3 
Averaged state anxiety in orals 64.09 7 ase dee pum 56.16 60.92 
EE pun. EE (GPA) 3.21 3.24 3.17 3:26 67.73 67.70 
cholastic Aptitude Test verbal 566.00 | 549.68 à ; 3. 5 
5 : 581.31 * 
ue go 3.49 | 3.18 i MX [rue 
Lene Pe Kee 3.25 | 2.24* 2.78 | 2.73 2.05 | 3.26% 
Short answer test 29.33 | 40.64* 34,25 34.93 41.48 24.26** 
Averaged essay 24-68 | 21.90" | 23.52 | 22.89 21.77 | 24.19* 
Averaged speaking time in oral 26.88 | 26.66 an 15.31 | 13,86 | 16.71 
or student perfromance rating| 3.39 3.44 3.47 20 EU. E 25e 
veraged proctor performance rating| ^ 3.60 3.54 3.08 cu pote BP 
Averaged reported study time 10.00 | 10.13 9.63 ie : : z 3 gne 
à s . Š 10. 
* p< .05. 
eru certi: 
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substantiate the correlational findings. In- 
ternals began completing orals 10 days 
earlier than externals and averaged 8 points 
higher on the total evaluation test score, 
with differences on both short answer and 
essay questions being significant. 

No reliable differences between internals 
and externals were found on trait test anxi- 
ety, previous GPA, or Scholastic Aptitude 
Test (SAT). In addition, even though in- 
ternals performed more effectively on the 
academic outcome measures, they did not 
spend more time studying for orals, receive 
better proctor ratings, or take less time to 
finish their orals. These data generally sup- 
port the first two hypotheses. 

Discrepancies between contracted and 
final grades were found for 12 internals (of 
which 6 overestimated their final grade) 
and 18 externals (including 17 overesti- 
mates). The mean absolute discrepancy for 
all internals was .19 as compared with 84 
for externals (t = 2.42, p < .02), thus sup- 
porting the third hypothesis. A 2 x 2 chi- 
square comparing direction of discrepancy 
and locus of control yielded x? = 7.95, df = 
1, p < 01, indicating that while, as a whole, 
those subjects whose contracted and final 
grades were discrepant tended to overesti- 
mate their final grades, external subjects re- 
liably overestimated their final grades, while 
internal subjects equally overestimated and 
underestimated their final grades. 


Test Anxiety and Academic Performance 


Correlations between trait test anxiety 
and the 14 remaining variables yielded 
three reliable relationships, indicating that 
more generally anxious students reported 
higher levels of state anxiety both in the 
first class (r = .48) and during oral exam- 
inations (.43) and obtained lower verbal 
SAT scores (—.33) than their less anxious 
peers. Statistical analysis of the mean scores 
presented in Table 1 after grouping students 
by a median-split provided one additional 
reliable difference, with less anxious sub- 
jects spending less time in the oral examina- 
tion situation. While this finding may be 
ak evidence for the fourth 
hypothesis derived from test anxiety theory, 
the absence of an anxiety influence on any 
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other academic outcome measure suggests 
rejection of all four predictions made for 
the anxiety variable. 

The poor predictive usefulness of the Test 
Anxiety Scale may have been partially due 
to the course contingencies which were de- 
signed to minimize anxiety. Analysis of 
Anxiety Differential scores collected before 
the orals indicated that the entire class re- 
ported progressively less state anxiety 
throughout the semester. Mean Anxiety 
Differential scores decreased linearly from 
71.51 on the first oral to 62.45 on the fifth. 
A t test of anxiety scores for the first and 
final oral by each student indicated that 
subsequent assessments were perceived as 
less stressful (t = 2.45, p < .05). The mag- 
nitude of state anxiety decrements was un- 
related to initial level of trait anxiety. 


Supplemental Analyses 


The best single predictor of final grade 
was latency (r = —.63), indicating that the 
sooner a student began completing course 
requirements, the higher final grade he ob- 
tained. A stepwise multiple regression of the 
six predictor variables was used to predict 
final grade. Locus of control accounted for 
the greatest amount of variance, and GPA 
was the only other variable which signifi- 
cantly augmented predictive accuracy by 
raising R from .46 to .57. These same two 
variables were the only significant contribu- 
tors to prediction of latency in a second 
regression analysis. Previous semester 
grades were reliable predictors of short an- 
swer and essay scores and both student and 
proctor ratings of oral proficiency. 

These results point toward a similar pat- 
tern of performance by students with an in- 
ternal locus of control and high grade point 
averages. It should be noted, however, that 
these two dimensions were relatively in- 
dependent of each other (r = —.10) and 
that both loeus of control (r — .15) and 
prior academie performance (r — .06) were 
not associated with trait test, anxiety. 


Discussion 


The results of this study represent a repli- 
cation of several previously reported find- 
ings and offer clarification of several current 
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ambiguities concerning student performance 
in personalized instruction courses. The 
strong negative relationship between la- 
tency and final grade is in agreement with 
Sheppard and MacDermot (1970), who re- 
ported that students who earned higher 
grades in an individualized course com- 
pleted assessments at a faster rate than 
students who received lower grades. 

The differential efficacy of personalized 
formats for teaching various “types” of 
students has only been tangentially investi- 
gated. Born et al. (1972) concluded that in- 
dividualized instruction is most beneficial 
to students with histories of poor to medi- 
ocre academic performance. Learning out- 
comes in the present study, however, 
favored students with above average aca- 
demie records, as these students began 
working earlier in the semester, were rated 
as providing more proficient performances 
on both oral examinations and on a written 
test, and earned better grades in the course. 

It is interesting to note that although in- 
ternal and external subjects were equivalent 
on prior scholastic aptitude and achieve- 
ment measures, the former group produced 
reliably better performances on the aca- 
demic outcome measures. 

The usefulness of the locus of control 
measure in predicting academic outcome 
was consistent with the centrality of self- 
controlled student planning to the course, 
Phares’s (1968) finding that internally ori- 
ented subjects manifest superior use of per- 
sonally relevant information was extended 
to an academic setting by the present study. 
Internals made very effective use of course 
contingencies, particularly those which re- 
warded early completion of oral exams, 
During the first five weeks of the semester. 
81% of the internals passed their first oral 
examination (thus earning 2 bonus points), 
whereas only 49% of the external group 
achieved this incentive, The more accurate 
prediction of their final grades by inter- 
nal subjects also substantiates Phares’s 
conclusion and is in agreement with findings 
reported by Wolfe (1972). 

The consistent failure to find reliable 
relationships between locus of control and 
overall indices of GPA may be due to the 
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widespread use of externally imposed rather 
than self-imposed control contingencies in 
traditional academic instruction (Aiken, 
1963). For example, internal subjects 
achieved an average final course grade of 
3.25, which was equivalent to their mean 
prior semester GPA of 3.21, while external 
subjects manifested a course grade decre- 
ment of one full grade (M = 2.24) in com- 
parison with their previous GPA of 3.24. 
These data imply that internal subjects 
maintain a constant level of performance 
regardless of course format. The achieve- 
ment of external subjects, however, declines 
sharply in a course which stresses student 
control of contingencies. 

The lack of relationships between test 
anxiety and course performance is explica- 
ble in terms of the design of the course, 
which led to a progressive reduction of state 
anxiety throughout the semester, Analysis 
indicated that the decline in state test anxi- 
ety did not vary as a function of past or 
present academic performance but rather 
was due to the effectiveness of the course 
contingencies. While it may be speculated 
that personalized formats do alleviate anxi- 
ety to a greater extent than traditional 
examination courses, this question awaits 
further investigation. 

Perhaps the data also imply that methods 
of education should be tailored to the char- 
acteristics of the individual because not all 
forms of teaching are equally effective with 
all students, If internal subjects can pro- 
gress more rapidly and accrue more benefits 
from a semiautonomous, self-paced educa- 
tion, educators might give serious considera- 
tion to providing this option. It is necessary 
that not only intellectual inputs be the focus 
of education, but that individually appro- 
priate motivational orientations be con- 
sidered as well. 
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EFFECTS OF MODE OF MODELING, MODEL AGE, AND 


ETHNICITY ON RULE-GOVERNED 
LANGUAGE BEHAVIORS 


JAMES I. GRIESHOP ax» MARY B. HARRIS* 
University of New Mexico 


In order to assess the effects of modeling on the performance of 
syntactic and semantic language behaviors, 208 male and female 
Anglo and Chicano sixth-grade students were randomly exposed to 
live, audiotaped, or written models, who were characterized as Anglo 
or Chicano adults or peers, or to a no-model control group. In the 
absence of direct or vicarious reinforcement and of instructions to 
imitate, significant modeling effects for almost all measures in both 
imitation and generalization phases were found. Within modeling 
conditions, there were significant tendencies for live and audiotaped 
models to produce greater imitation of the semantic categories than 
written models and for adult models to produce greater imitation of 


the relative clause measure than peer models. 


Although the phenomena of modeling and 
imitative behavior have long been of inter- 
est to researchers, the systematic investiga- 
tion of the effects of social learning varia- 
bles on rule-governed linguistic and cogni- 
tive variables is a much more recent ap- 
proach (Zimmerman & Rosenthal, 1974). 
Social learning research in areas such as ag- 
gression, altruism, and therapy for phobias 
has demonstrated that filmed, videotaped, 
and written models could be effective in 
altering behaviors, but nevertheless almost 
all the research on rule-governed cognitive 
behaviors has becen done using a live model. 
Although videotaped (Zimmerman & Dia- 
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lessi, 1973) and written (Harris & Evans, 
1973, 1974) models have been used to influ- 
ence creative behaviors, there have been no 
Systematic attempts to assess the effects of 
mode of modeling on any linguistic behav- 
iors. The present study used live, tape- 
recorded, or written models to see whether 
their effects would be equivalent or whether 
the impact of a live model would increase 
the amount of attention and thus learning 
and imitation demonstrated by the observer 
(Bandura, 1971). 

A second variable investigated in the 
present study was the age of the model. Re- 
sults of research on the influence of peer 
versus adult models have been contradic- 
tory (e.g., Bandura & Kupers, 1964), with 
some studies finding a greater influence of 
adult models (Nicolas, McCarter, & Heckel, 
1971), some finding peer models more influ- 
ential (Bandura, Grusee, & Menlove, 1967) , 
and some finding no differences between the 
two (Malcolm, 1971). The reason for the in- 
consistency of findings probably lies in the 
discrepancy between two factors which are 
both felt to increase the influence of a model: 
similarity, which is greater for a peer; and 
expertise or prestige, which is greater for 
an adult. The present study used adult and 
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peer male models to see which of the two 
would have the greater influence in this 
instance. 

A final variable manipulated was the 
ethnieity of the model. The models either 
were (or were presented as, in the written 
conditions) Chicano or Anglo and identifia- 
ble as such by appearance, name, and ac- 
cent (by name alone in the written model 
conditions). Three models of each age and 
ethnic group served in the experiment, to 
insure that the age and ethnicity variables 
were not confounded with any unique char- 
acteristics of any particular individual. Al- 
though no previous studies have investi- 
gated the effects of exposure to a Chicano 
model, investigations of the effects of ob- 
serving black models and white models by 
black subjects and white subjects (Breyer & 
May, 1970; Liebert, Sobol, & Copeman, 
1972; Rosenbaum, 1972; Thelen, 1971; 
Thelen & Freybear, 1971) have produced 
completely contradictory findings. Garcia 
and Zimmerman (1972) did find that praise 
from a Chicano experimenter (not a model) 
was more reinforcing than praise from an 
Anglo experimenter. Based on the assump- 
tion that similarity to a model would in- 
crease imitation (Bandura, 1971), an inter- 
action of ethnic status of model and ethnic 
status of subject was expected, such that 
Chicano subjects would imitate Chicano 
' models and Anglo subjects would imitate 
Anglo models more than vice versa. ; 

Both Chicano and Anglo boys and girls 
served in the study; no overall ethnic or sex 
differences were predicted. The dependent 
measures used in the present study were two 
semantic valuational categories, two syn- 
tactie structures (relative clauses and prep- 
ositional phrases), and overall sentence 


length. 


METHOD 
Subjects, Models, and Experimenter — , 
Subjects were 208 sixth-grade students from. four 


rque, New Mexico. 
of whom were 
were randomly 
tal conditions 


elementary schools in Albuque 
Eight boys and eight girls, half 
Chicanos and half were Anglos, 
assigned to each of the 12 experimen 


trol group. 
ond ee 1112 and six men, half of whom 
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were Chieanos and half were Anglos, were trained 
and used as models in the live and taped condi- 
tions, with approximately equal numbers of sub- 
jects exposed to each of the models. In the written 
condition, the age and names of four real models 
appeared on the pages handed out to the subjects 
along with the model's answers. 

The experimenter was a 32-year-old male Anglo 
graduate student. + 

Stimulus materials. Two parallel but different 
12-item sets of line drawings of persons or animals 
engaged in familiar activities were selected from a 
larger number of pictures in a pilot study on the 
basis of eliciting complete sentences from subjects 
as well as having approximately equivalent num- 
bers of prepositional phrases and equal sentence 
length. No subjects in the pilot study used any 
relative clauses or valuational statements. The first 
12 pictures served as stimuli for the model and 
were used in the imitation phase for subjects in 
the experimental conditions, The other set of 
pictures was used to assess generalization. Subjects 
in the control condition simply saw both sets of 
pictures without one identified as previously 
having been shown to a model. 

Model’s statements. A different statement was 
associated with each of the 12 stimulus pictures 
seen by a model. Each statement contained a prep- 
ositional phrase (eg., “on a court,” “with each 
other"), a relative clause (eg. “who is lying,” 
“which carries the baby"), and a valuational- 
preference belief category (eg. “Tom believes the 
best barber—,” “the bear doesn't enjoy—"). The 
mean sentence length for the 12 modeled sen- 
tences was 15.25 words. 


Procedure 


Subjects were run in small groups of 4 to 7, with 
individuals rather than groups serving as the unit 
of random assignment. After being escorted by the 
subject to the experimental site, they were told, 
“T am interested in how people make up sentences 
and today I want to find out how you make up 
sentences.” In the live model conditions, a model 
was also present, and the experimenter told the 
subjects that another person was going to compose 
sentences before they did. The model was then 
asked his name by the experimenter and asked 
to compose sentences for each of the pictures 
which were shown by the experimenter to the 
model and the subjects. After the 12 sentences 
were composed, the experimenter thanked the 
model and said to the subjects, “Now that you 
listened to (name of model) I want you to make 
up sentences about the same pictures." He then 
asked each subject to write his name on a sheet 
of paper and to write his sentences on the sheet. 
The experimenter then held up the 12 pictures in 
sequence, waiting until each subject had finished 
composing a sentence for each picture before 
proceeding to the next. The papers were then col- 
lected and new ones distributed to the subjects, 
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who were then shown a new set of 12 pictures 
and asked to make up a sentence about each of 
them. After this generalization task, the papers 
were collected, and the subjects were asked not to 
diseuss their experience, and were thanked and 
dismissed. 

Subjects in the other conditions were treated 
identically except for the presentation of the 
model. Those in the audiotaped model conditions 
were told that they would first listen to a tape of 
another person making up sentences. The tape 
began with a dialogue between the experimenter 
and the model, in which the model was asked to 
state his name and age and asked to compose 
sentences about the pictures. As the model com- 
posed the sentences on tape, the pictures were 
shown to the subjects by the experimenter. After 
the model had composed his sentences and been 
thanked by the experimenter, the tape recorder 
was turned off and the subjects were treated as 
in the live model conditions. 

Subjects in the written model conditions were 
told that they would see some sentences written 
by another person before composing their own. 
Each subject was then given a set of the 12 stimu- 
lus pictures, with the model's sentences written 
below each picture in handwriting appropriate for 
a child or adult. The model's name and age ap- 
peared on a cover sheet. Since some subjects had 
difficulty reading the handwriting, the experi- 
menter held up the stimulus cards and read aloud 
to subjects the sentences which the model had 
written. The stimulus pictures were then collected, 
and subjects were asked to compose sentences 
exactly as in the other conditions. 

"The control group subjects were treated identi- 
cally to the experimental subjects with the excep- 
tion that no presentation or mention of a model 
was made. They were merely told that the ex- 
perimenter was interested in how they made up 
sentences, shown the same pictures as the experi- 
mental subjects, and asked to write their sentences, 

Scoring. All scoring was done with the identity 


TABLE 1 
F VALUES ror A PRIORI COMPARISONS OF 
Mopetine Versus CONTROL GROUPS 


Dependent measure 


Phase Rela- | p 
Valu- |Other| Rela- | Preposi- 
ational | value clause | oma, | Length 
Imitation |48.23***|5.62*6.84**| — 23.82*** 
Generali- j 
zation — [8.12 |1.63 |13.32***/18 76*** 


Note.—-N ot computed, as one-way analysis of 
variance was not significant. 
*p «05. 
** p< 01. 
*** p « 001. 
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of the subject, his school, and his experimental 
condition unknown to the scorers. A random 30% 
of the papers were independently scored by a sec- 
ond judge, and Pearson product-moment correla- 
tions were calculated for each measure in each 
phase. They ranged from r = +.92 for the valua- 
tion category-imitation phase to r = +1.00 for 
length in both phases. 

There were five dependent measures scored for 
each of two phases, making a total of 10 measures. 
The valuation category was considered present if 
a value, preference, or belief attributable to the 
pictured person or animal was stated in the sen- 
tence’s predicate, for example, “The boy likes to 
eat spaghetti.” The other value category was con- 
sidered present if any word or phrase expressing a 
value, preference, or belief was present, for ex- 
ample, “good,” “favorite.” Relative clauses were 
defined as constructions headed by a relative 
pronoun (who, that, which) and following a noun 
phrase, for example, “the man who is brushing his 
teeth—.” Prepositional phrases were defined as a 
phrase consisting of a preposition, an article (with 
rare exceptions) and a noun, for example, “on a 
court.” Length was determined by counting the 
number of words per response, with contractions 
scored as single words. 


ReEsuLts 
Control Groups 


The scores for the valuational and other 
value categories, because of the high inci- 
dence of zero scores, were analyzed by chi- 
Square analyses. No significant differences 
were found on either of these measures in 
either the imitation or generalization phases 
(all x? values < 2, all df = 1, all ps < .05). 
No relative clauses were written by any 
subject in either phase, making statistical 
analyses unnecessary. A 2 x 2 (Ethnicity x 
Sex) analysis of variance was performed on 
the prepositional phrase and length data. 
No effect for prepositional phrases was 
found, but in the generalization phase fe- 
males wrote significantly more words (X = 
88.125) than males (X = 73.000, F = 5.01, 
df = 1/12, p < 05). j 


Experimental versus Control Groups 


One-way 13-group analyses of variance, 
collapsing Across subject sex and ethnicity 
revealed significant effects for the valua- 
tional category (imitation phase), other 
vane category (both phases), relative 
c ause measure (both phases), prepositional 
phrase measure (generalization phase), and 


ve 
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TABLE 2 
MEAN SCORES ron 12 SENTENCES ror MODELING AND CONTROL GROUPS 
Dependent measure 
Phase 
Valuational | Other value | Relative clause | Prepositional Length 
Imitation 
All controls 375 313 000 4.000 
) i n E 4 d 82.088 
All modeling subjects 6.026* 1.167* 1.259* 5.823 115.729* 
Live model subjects 7.000* 1.016 1.450 5.562 115.344 
Taped model subjects 6.719 1.109 1.250 5.594 118.328 
Written model subjects 4.359 1.375 1.078 6.312 113.516 
Generalization 
All controls .250 .125 .000 3.438 80.563 
All modeling subjects 1.729 .542 .385 6.130* 109.604* 
Live model subjects 2.344^ .656 875 5.656 105.500 
Taped model subjects 1.797 .359 .203 6.125 111.891 
Written model subjects 1.047 .609 .578 6.609 111.422 


a Live model and taped model subjects higher than written model subjects but not different from each 


other (p < .05). 


* Combined modeling subjects higher than combined controls (p < .05). 


length (both phases) at the p < .05 level or 
beyond (smallest F = 1.80, df 12/195). The 
results of Scheffé a priori comparisons of the 
12 modeling groups versus the control group 
for the measures with significant between- 
group differences are presented in Table 1. 
` Scores for subjects in the modeling groups 
were significantly higher than for control 
group subjects on the valuational (imitation 
phase) , other value (imitation phase), rela- 
tive clause (imitation phase), prepositional 
phrase (generalization phase), and length 
(both phases) measures, indicating a strong 


modeling effect. 


Modeling Groups 


Mean scores on the various measures for 
subjects exposed to different modes of 
modeling are shown in Table 2. The results 
of analyses of variance for the main effects 
of mode of modeling, age of the model, 


ethnicity of the model, ethnicity of the sub- 
ject, and sex of subject on all measures for 
; nt are shown in 


both phases of the experime! 


Table 3.5 "T 
1 tegory. The significant 
Valuational category. B elation 


mode of modeling effect in 
ractions were statisti- 


iscussed further due 
em meaningfully. 


? Several higher order intera 
cally significant but are not di: 
to the difficulty of interpreting th 


phase was found by Scheffé a priori com- 
parisons to be due to the fact that subjects 
exposed to both live (F — 21.48, df — 1/144, 
p « 001) and taped (F = 1745, df = 
1/144, p < .001) models imitated signifi- 
cantly more than those exposed to a written 
model, while not differing from each other. 
In the generalization phase as well, those 
in the live (F = 6.37, df = 1/144, p < .05) 
and taped (F = 3.85, df = 1/144, p < .05) 
model conditions had higher scores than 
those seeing a written model, while not sig- 
nificantly differing from each other. Signifi- 
cant interactions between model age and 
subject sex in both the imitation (F = 6.86, 
df = 1/144, p < .05) and generalization 
(F = 4.11, df = 1/144,p < .05) phases indi- 
cated that male subjects tended to imitate 
the adult models more than peer models, 
while female subjects imitated peer models 
more than adult models. 

Other value category. The only meaning- 
ful finding revealed by the analyses of vari- 
ance was a significant effect of subject 
ethnicity in the generalization phase (F = 
5.65, df = 1/144, p < .05), indicating that 
Anglo subjects gave more other value items 
(X = .698) than did Chicano subjects (ec 
385). 

Relative clause category. In both the imi- 
tation (F = 4.53, df = 1/144, p < .05) and 
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TABLE 3 
F VALUES FOR MAIN EFFECTS OF THE ANALYSES OF VARIANCE 
Dependent measure 
Phi 
i Valuational | Other value | Relative clause | Prepositional | Length 
Imitation 
"Mode of modeling 12.96*** «1 « 1 s un 
Age of model «1 «1 a z d^ 
‘Ethnicity of model «1 «1 il A FS 
Ethnicity of subject «1 <1 « a Pa 
Sex of subject. «1 «1 «1 
Generalization 
Mode of modeling 3.51* 1.97 1.03. 1.56 SA 
Age of model <1 2.51 4.36 «1 s 
Ethnicity of model <1 < A 1.51 <1 T 
Ethnicity of subject «1 5.65' 1.82 1.57 bs 
Sex of subject «1 «1 2.55 «1 amm. 


Note. df for mode of modeling — 2/144. df for all other effects — 1/144. 
*p < .05. 


**5 < Ol. 
*** 5 < 001. 


generalization (F = 4.36, df = 1/144, p < 
.05) phases, subjects exposed to an adult 
model (Xs = 1.615 and .563 for the two 
phases) wrote more relative clauses than 
those exposed to a peer (Xs = .979 and 
.218). In the imitation phase, a significant 
interaction between mode of modeling and 
model ethnicity (F = 3.81, df = 2/144, p < 
-05) indieated that subjects exposed to live 
(X — 1813) and taped Anglo models (X — 
1.688) had higher scores than those subjects 
exposed to live (X = 1.313) and taped 
Chicano models (X — 813), but that sub- 
jects exposed to written Chicano models 
(X = 1.594) had higher scores than those 
Who observed written Anglo models (X — 
0.563). In the generalization phase, a sig- 
nificant interaction effect of model age and 
model ethnicity (F = 3.86, df = 1/144, p < 
.05) revealed that subjects exposed to Anglo 
adult models had lower scores (X = 
than those subjects exposed to Chicano 
adult models (X = 833), but subjects ex- 
posed to Anglo peer models had higher 
scores (X = 271) than those subjects who 
observed Chicano peer models (X = 146). 

Prepositional phrase category. The only 
Significant effect, was an interaction between 
model age and model ethnicity in the gen- 
eralization phase (F = 5.70, df = 1/144, 


p < .05), which showed that subjects ob- 
serving Anglo adult models (X = 5.729) 
had lower scores than those subjects who 
observed Chicano adult models (X = 
6.813), while subjects exposed to Anglo peer 
models (X = 6.333) had higher scores than 


those subjects exposed to Ohicano peer. 


models (€ = 5.479). 

Length category. Although no significant 
effects were found in the imitation phase, 
an interaction between mode of modeling 
and model age (F — 4.569, df — 2/144, 
P < .05) in the generalization phase re- 
vealed that subjects exposed to live p 
110.625) and written adult models ix 
115.750) wrote more words than those sub- 
jects exposed to live (X — 100375) and 
written peer models (X = 107.094), while 
Subjects hearing taped adult models wrote 
fewer words (X — 103.125) than those sub- 
jects who heard the tape peer models (€ — 
118.250). 


Discussion 


_ The results of this study indicated that 
sixth-grade students, with neither explicit or 
implicit instructions to imitate nor reinforce- 
ment, were able to abstract rules govern- 
ing the use of modeled Sentences and sub- 
Sequently to use those rules to generate new 


— 


— 
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sentences in response to novel stimuli. In 
particular, the students exposed to a model 
abstracted the rules governing the valua- 
tional categories and relative clauses and 
then used those rules in both imitative and 
novel tasks to write sentences which ex- 
pressed the modeled values and syntactic 
structures, whereas the no-model control 
students did not write sentences with the 
same number of semantic or syntactie struc- 
tures. The modeling effect, since it occurred 
without instructions or reinforcement, pro- 
vides further evidence that imitation of 
models can occur without them (Bandura 
1971). 

The results, however, only partially clari- 
fied the role and influence of specified atten- 
tional variables within the social learning 
theory paradigm. The mode of modeling 
and the age of the model had some effects 
upon the abstraction and imitation of the 
modeled language categories, both directly 
and in interaction with other factors, par- 
ticularly the sex of the observer. The 
ethnicity of the model, like that of the ob- 
server, had little effect upon the modeling 
phenomenon. The latter observation tends 
to corroborate the conclusion of Harris and 
Hassemer (1972) that the language (Span- 
ish or English) used by the model had no 
effect. The sex of the observer alone had no 
apparent effect upon the outcomes, although 
it did seem to interact significantly with age 
of model. 

Much like the results of the Rosenthal, 
Zimmerman, and Durning (1970) study, no 
mimicry or precise imitation of the models 
and the modeled sentences were found. This 
lack of mimicry further supports the social 
learning theory belief that modeling proce- 
dures can be used to generate new and crea- 
tive responses which conform to the 
modeled rule-governed behaviors. Another 
interesting feature was the use of the other 
value category by subjects exposed to a 
model, although only three of the model’s 
sentences embodied this category. This oe 
suggests that the students in the es 
study, similar to those subjects used by 

DL d Hill (1968) and Liebert, 
Odum, Liebert, an 3 d 
Odum, Hill, and Hufi (1969), seeme 
employ problem-solving strategies to ana- 
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lyze the modeled sentences, and generalized 
their behavior beyond the specific categories 
modeled. 


Methodological Implications 


Several methodological innovations in- 
corporated in this study have special sig- 
nificance for social learning theory and for 
the practical application of modeling proce- 
dures. 

The fact that 12 different individuals 
served as the models for this study negates 
the possibility that modeling effects may be 
due to some idiosyncracy of the model used. 
This fact is particularly meaningful since 
the majority of modeling studies have used 
only a single model, and it implies that in an 
applied setting any of a number of potential 
models can be used with a degree of effec- 
tiveness. 

A second innovation was the use of peers 
as models. The finding that the scores of 
subjects (with the exception of the scores 
for the relative clause measure) exposed to 
peer models were nearly equal to those of 
subjects observing adult models suggests 
that peers may be effectively used as models 
displaying rule-governed behaviors for 
sixth-grade children, although it is possible 
that with complex grammatical construc- 
tions, such as relative clauses, adult models 
may be more appropriate. 

The small groupings of students used in 
this study indicate the modeling effects for 
rule-governed behaviors occur in group 
situations similar to classroom settings. This 
finding replicates that made by Rosenthal 
and Carroll (1972), and has obvious practi- 
cal significance. 

The fact that the subjects were required 
to compose their responses in written form 
without any word lists as cues also distin- 
guished this study from other modeling 
studies. The results clearly demonstrated 
that the specified rules and behaviors could 
be orally modeled, transmitted and ab- 
stracted, and finally converted by the sub- 
jects into written form. The procedural 
innovation and consequent finding imply 
that the procedure could well be incor- 
porated as a classroom technique for in- 
creasing the performance of rule-governed 
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language behaviors. Moreover, the results 
imply that for some categories of behavior, 
taped and even written examples can be 
used to supplement the teacher as a model. 
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Smith delivers a solidly ap- 
plied treatment of the multi- 
dimensional aspects of educat- 
ional psychology. Focusing all 
concepts on the teacher, the 
learner, and the teaching en- 
vironment, the book bridges the 
gap between learning theories 
and teaching processes in a way 
that is applicable to any subject 
at any level. 

Challenging, point for point, 
the book provides up-to-date 
coverage of such topics as: 
instructional technology, theory 
and science of instruction, hu- 
manistic psychology, learning 
disabilities, language develop- 
ment, group dynamics, evaluation 
and many, many more areas of 
current importance and concern. 
The wide range of relevant topics 
covered makes this book a most 
comprehensive treatment of all 
traditional and modern aspects 
of educational psychology. 


New 1975! 

In a lively, clear and straightfor- 
ward manner, the style of the 
book involves the reader. 

— Involves the reader with many 
concrete, practical applications, 
examples and case studies to 
help smooth the transition from 
theory to practice. 

— Involves the reader with rele- 
vant illustrative material includ- 
ing cartoons, charts, and photo- 
graphs all presented in a modem, 
readable, two-color format. 

— |nvolves the reader with 
thought provoking and stimulat- 
ing chapter questions and exer- 
cises and innovative classroom 
projects and activities. 


Educational 
Psychology 
and its 
Classroom 
Applications 
M. Daniel Smith 


University of New Hampshire 


Smith delivers a complete teaching package 


which includes: 
Student Study Guide containing behav- 
ioral objectives, pre-tests, and post-tests. 


o Instructors Supplement featuring a 
variety of alternative teaching strategies and 
course formats in which the book can be 
used. The guide also suggests ways of 
motivating students and approaches to 
assessing student understanding. 

tj A Manual of Test Items including the 
behavioral objectives the tests are designed 


to reinforce. 
January 1975, paperbound, est. 600 pp. 


Allyn and Bacon, Inc. 


College Division, Department 893 
470 Atlantic Ave., Boston, MA 02210 
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Behavior Dynamics is a broad, general 
Survey of the psychological aspects of 
human behavior that are reflected in 
teaching, learning, and growth. Advocat- 
ing a humanistic perspective, Hamachek 
develops a psychological framework for 
understanding and interpreting behavior; 
examines physical, psychological, and 
intellectual growth processes as related 
to developmental outcomes and behav- 


ioral consequences; looks at the dynam- 
ics of "good" teaching in terms of 
self-understanding and the ability to 
make instruction relevant and meaning- 
ful; studies learning processes and 
relationships to motivational inputs, 
self-concept variables, and measurement 
procedures; and presents an overview of 
ways of understanding and managing 
classro*m groups. 

February 1975, paperbound, est. 672 pp. 


Behavior Dynamics 
in Teaching, Learnin 
and Growth 


New 1975! 

Hamachek’s dynamic writing style 
is both clear and informal, personal- 
izing the solid informational base of 
clinical evidence and empirical re- 
Search with verbal and visual illus- 
trations and examples. The exercises 
throughout the text are personal in- 
volvement opportunities for reflective 
self-examination and Self-discovery. 
Hamachek views teaching, learning, 
and growth as personal activities, 
focusing on human meanings, human 


understandings, and human experi- 
ences. 


Don E. Hamachek 


Michigan State University 


Allyn and 
Bacon, Inc. 


College Division, Department 893 
470 Atlantic Ave., Boston, MA 02210 


3 Cultural Democracy, Bicognitive 


' Development, and Education 


By MANUEL RAMIREZ, III 
and ALFREDO CASTANEDA 


All too soften educational and mental health institutions in the 
United States view the cultures of ethnic groups as having a nega- 
tive effect on a child's intellectual and emotional development. This 
book shows how American education can better serve all children 
through the philosophy of "cultural democracy." Specifically focus- 
ing on Mexican American children, the authors have studied the 
variables that affect children's performances in school as well as 
the development of cognitive styles through socialization. They con- 
clude with many practical suggestions for creating bicultural/bilin- 
gual educational environments to stimulate development of cogni- 
tive flexibility in children. 

October 1974, 189 pp., $11.95] £5.75 
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Language Use and School Performance 


y AARON V. CICOUREL, KENNETH H. 

INNINGS, SYBILLYN H. M. JENNINGS, 

hi XENNETH C. W. LEITER, ROBERT MacKAY, 
HUGH MEHAN and DAVID;R. ROTH 


F Wtere is a significant (and perhaps, disturbing) critique of how ele- 
> mentary school children's abilities, progress, and potential are evalu- 
M ated, It is based on a thorough investigation of how children and 
E j those who evaluate them actually behave and interact. Topics in- 
1 clude; Ad Hocing in the Schools: A Study of Placement Practices in 
" the Kindergartens of Two Schools; Accomplishing Classroom Les- 

[Wm and Some Basic Theoretical Issues in the Assessment of the 
! VP Child's Performance in Testing and Classroom Settings. 

1975, in preparation 

A volume in the Cognitive Sociology Series 


. The Next Generation 
í AN ETHNOGRAPHY OF EDUCATION IN AN 
£X. URBAN NEIGHBORHOOD 


By JOHN U. OGBU 


p This timely volume offers a p 


enetrating analysis of school failure 
asa historical adaption to unequal opportunity. It probes the forces 
that created and still sustain this pattern of failure: local social 
stratification, myths and stereotypes which support that system, the 
behavior of local school personnel and attitudes of ghetto residents 
toward competition with members of the dominant society. 


WV) 1974, 270 pp., $11.501 £5.50 
: A volume in the Studies in Anthropology Series 


Academic Press, Inc. 
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EDUCATING EXCEPTIONAL CHILDREN IN A CHANG- 
ING SOCIETY by Harold D. Love, State College of 
Arkansas, Conway. Basically about children who are excep- 
tional, but who still function in our society with the help of 
many people. Although these children are exceptional, they 
still have basic human motives, drives, and needs, making 
their only difference the need for additional help. It has 
long been the wish of the author that a book be written 
which could be used for undergraduates and also for 
beginning graduate students. Most books acceptable for 
graduate students are too difficult for the undergraduate. 
This does not have to be, and Doctor Love has proved this 
by writing a book which is not too difficult for undergrad- 
uates, nor too easy for graduate students. The author has 
tried not only to write an integrated and unified book 
about exceptional children, but to place the parents of such 
children and auxiliary personnel and agents in the proper 
perspective, This makes the emphasis in this text different 

_ from most. The tone of complacency in special education is 

'' discussed, along with the controversy surrounding methods. 
used in educating exceptional children. This book deals not 
only with children who have mental and physical problems, 
but also with children who are intellectually gifted and 
need special attention to develop their talents. It also deals 
with the child who has average or above average intelli- 
gence, but who still suffers from educational problems. '74, 
264 pp., 19 il., 7 tables, $9.75 


ISSUES AND TRENDS IN BEHAVIOR THERAPY edited 
by Henry E. Adams, Univ. of Georgia, Athens, and Irving P. 
Unikel, Veterans Administration Hospital, Atlanta. (10 
Contributors) The purpose of this book is to evaluate 
current issues and trends in the area of behavior therapy, A 
paragraph or two of introductory material preceding each 
chapter will fit its contents into the overall scheme. The 
beginning chapters are concerned Tespectively with issues in 
practice and research and to elaborate upon its aims and 
Purposes. Part I of this book, about 100 pages, is concerned 
with issues and theoretical developments. Several contribu- 
tors discuss issues in the practice of behavior therapy in a 
collaborative effort. Some papers subsequently deal in 
depth with one major issue and source of criticism for 
behavior therapy, its capability for dealing with complex 
cases without specific circumscribed behaviors or symp- 
toms, particularly “existential” problems. Following this, 
an overview of the research issues in the area are given. 
These include the general difficulties which accrue when 
matters are dismissed as being "academic", the need for the 
Study and integration of Physiological variables and the 
necessity for a focus on the development of behavior since 
the ways maladaptive behaviors are successfully treated are 
not necessarily indications of the manner in which these 
behaviors are acquired. The book displays the Scope of 
behavioral treatment. techniques by working with obses- 
sive-compulsive disorders, Finally, the evaluation of nonspe- 
cific treatment effects is discussed. An original and stimu- 
lating work on the influence of demand and contextual 
variables in behavioral fear assessment. 773, 288 pp., 38 il. 
44 tables, cloth-$10.95, Paper-$7.95 A ; 
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CLASSROOM DISCIPLINE: A Positive Approach by Susa 
Bray Stainback and William, Clarence Stainback, both o; 
Virginia State College, Petersburg. The primary focus oi 
this book is on straightforward, specific, workable tech 
niques for classroom control. In essence, the author 
outline exactly what has to be done in the real, 
classroom situation to control disruptive behavior 
book's purpose is to provide prospective as well a 
practicing classroom teachers, supervisors and principal 
with techniques that can be used to humanize classroor 
management. Viable alternatives to outmoded techniques 
of classroom control, such as threats and corporal punish 
ment, are also presented. The impetus for this book's 
development was the result of the authors' experiences in 
working as supervisors of beginning and experienced 
teachers. The book is divided into four parts. A comparison 
of the behavior modification strategy adopted by the 
authors with two other popular strategies (psychodynar 
sensory-neurological) is presented in Part I Prevent 
techniques for maintaining classroom discipline as well a 
measures that can be employed to deal with disruptive 
behavior after it occurs are outlined in Part ll, Part lil 
includes techniques for changing the behavior of extremely 
disruptive students. In addition, various reinforcers tha 

be used in the school setting and the ethical issues involve 
in using reinforcement principles are discussed. This tc 

an attempt to fill this void in the literature. Selected art 
that present ways of preventing and/or dealing wit 
disruptive behavior are provided in Part IV. '74, 180 pp 
$5.50, paper 


AUDITORY PERCEPTUAL DISORDERS AND REMEDI 
ATION by Bernice E. Heasley, Human Development and 
Counseling Associates, North Canton, Ohio. Foreword by 
Kendall K. Ward. This text is written in clear, concise 
language with terms being fully explained so the content 
may be readily understood by both the professional and lay 
reader. Educators and families of children or adults who 
exhibit learning or social problems related to defective 
processing of auditory stimuli will find this book quite 
useful. Specific educators who will find the book interest- 
ing include speech therapists, hearing and language patholo- 
gists, audiologists, otologists, special education teachers, 
and teachers in normal classrooms which contain a percent- 
age of children who exhibit auditory perceptual disorders in 
varying degrees. Based on the Developmental Model of 
Speech and Language, the book provides auditory percep- 
tual training lessons which follow a carefully structured 
hierarchy in which lessons are graduated in terms of 
difficulty, length of lesson time, and numbers of processes 
to be introduced. This structure closely approximates the 
normal sequential landmarks of childhood development in 
the areas of hearing, language, and speech. Remediation 
tasks are based on common environmental and speech 
sounds and require no expensive equipment. Topics include 
components of auditory processing and criteria for recog- 
nizing symptoms of auditory perceptual problems. '74, 128 
PP., 20 iL, 1 table, $8.95 
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